Bigdata with Hadoop and Spark
Tujuan Program
Who Needs This?
Data Scientist
Digital Analyst
Analytics Engineer
Digital Marketing Manager
IT Manager
What You Learn?
◾ What is Big Data?
◾ Impact of Big Data
◾ Parallel Processing, Scaling, and Data Parallelism
◾ Big Data Tools and Ecosystem
◾ Open Source and Big Data
◾ Beyond the Hype
◾ Big Data Use Cases
◾ Introduction to Hadoop
◾ Intro to MapReduce
◾ Hadoop Ecosystem
◾ HDFS
◾ HIVE
◾ HBASE
◾ Why use Apache Spark?
◾ Functional Programming Basics
◾ Parallel Programming using Resilient Distributed Datasets
◾ Scale out / Data Parallelism in Apache Spark
◾ Dataframes and SparkSQL
◾ RDDs in Parallel Programming and Spark
◾ Data-frames and Datasets
◾ Catalyst and Tungsten
◾ ETL with DataFrames
◾ Real-world usage of SparkSQL
◾ Apache Spark Architecture
◾ Overview of Apache Spark Cluster Modes
◾ How to Run an Apache Spark Application
◾ Using Apache Spark on IBM Cloud
◾ Setting Apache Spark Configuration
◾ Running Spark on Kubernetes
◾ The Apache Spark User Interface
◾ Monitoring Application Progress
◾ Debugging Apache Spark Application Issues
◾ Understanding Memory Resources
◾ Understanding Processor Resources