Bigdata with Hadoop and Spark

Tujuan Program

Deep insight into the impact of Big Data including use cases, tools, and processing methods.

Knowledge of the Apache Hadoop architecture, ecosystem, and practices, and the use of applications including HDFS, HBase, Spark, and MapReduce.

Know-how to apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL.

Proficiency with Spark’s RDDs, data sets, use of Catalyst and Tungsten to optimize SparkSQL, and Spark’s development and runtime environment options.

Who Needs This?

Data Scientist

Digital Analyst

Analytics Engineer

Digital Marketing Manager

IT Manager

What You Learn?

◾ What is Big Data?
◾ Impact of Big Data
◾ Parallel Processing, Scaling, and Data Parallelism
◾ Big Data Tools and Ecosystem
◾ Open Source and Big Data
◾ Beyond the Hype
◾ Big Data Use Cases

◾ Introduction to Hadoop
◾ Intro to MapReduce
◾ Hadoop Ecosystem
◾ HDFS
◾ HIVE
◾ HBASE

◾ Why use Apache Spark?
◾ Functional Programming Basics
◾ Parallel Programming using Resilient Distributed Datasets
◾ Scale out / Data Parallelism in Apache Spark
◾ Dataframes and SparkSQL

◾ RDDs in Parallel Programming and Spark
◾ Data-frames and Datasets
◾ Catalyst and Tungsten
◾ ETL with DataFrames
◾ Real-world usage of SparkSQL

◾ Apache Spark Architecture
◾ Overview of Apache Spark Cluster Modes
◾ How to Run an Apache Spark Application
◾ Using Apache Spark on IBM Cloud
◾ Setting Apache Spark Configuration
◾ Running Spark on Kubernetes

◾ The Apache Spark User Interface
◾ Monitoring Application Progress
◾ Debugging Apache Spark Application Issues
◾ Understanding Memory Resources
◾ Understanding Processor Resources

Berapa Nilai Investasi
Yang Diperlukan?

IDR 10.000.000/pax