Apache Hadoop

Process and manage petabyte-scale data with the Hadoop ecosystem.

4 DaysIntermediateTensorNova Certificate of Completion

Course Curriculum

Module 1HDFS Architecture
  • NameNode, DataNode, and block replication
  • HDFS commands and file operations
  • Data locality principle
  • HDFS high availability with ZooKeeper
Module 2MapReduce
  • Map and Reduce paradigm
  • Writing MapReduce jobs in Java
  • Combiners and partitioners
  • MapReduce optimisation techniques
Module 3Apache Hive
  • HiveQL syntax for big data queries
  • Partitioning and bucketing
  • ORC and Parquet file formats
  • Hive on Tez vs MapReduce
Module 4HBase & ZooKeeper
  • HBase table design and row key strategy
  • CRUD operations with HBase Shell and Java API
  • ZooKeeper coordination service
  • HBase integration with Hive
Module 5YARN & Ecosystem Tools
  • YARN resource management
  • Apache Sqoop for RDBMS ingestion
  • Apache Flume for log aggregation
  • Apache Oozie workflow scheduling

Prerequisites

  • Linux command-line skills
  • Java programming basics (for MapReduce labs)
  • SQL knowledge

Who Should Attend

  • Data engineers building big data pipelines
  • Database administrators moving to big data
  • Software developers working with large datasets

Interested in Apache Hadoop?

Our training advisors will help you choose the right batch format, dates, and pricing for your team or individual goals.