Implement Hadoop Big Data
What You Learn?
◾ The architecture of Hadoop cluster
◾ What is High Availability and Federation?
◾ How to setup a production cluster?
◾ Various shell commands in Hadoop
◾ Understanding configuration files in Hadoop
◾ Installing a single node cluster with Cloudera Manager
◾ Understanding Spark, Scala, Sqoop, Pig, and Flume
◾ Introducing Big Data and Hadoop
◾ What is Big Data and where does Hadoop fit in?
◾ Two important Hadoop ecosystem components, namely, MapReduce and HDFS
◾ In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager
◾ Why use Apache Spark?
◾ Functional Programming Basics
◾ Parallel Programming using Resilient Distributed Datasets
◾ Scale out / Data Parallelism in Apache Spark
◾ Dataframes and SparkSQL
◾ Indexing in Hive
◾ The ap Side Join in Hive
◾ Working with complex data types
◾ The Hive user-defined functions
◾ Introduction to Impala
◾ Comparing Hive with Impala
◾ The detailed architecture of Impala