Big Data

Apache Hadoop

Process and manage petabyte-scale data with the Hadoop ecosystem.

4 DaysIntermediateTensorNova Certificate of Completion

Course Curriculum

Module 1HDFS Architecture

NameNode, DataNode, and block replication
HDFS commands and file operations
Data locality principle
HDFS high availability with ZooKeeper

Module 2MapReduce

Map and Reduce paradigm
Writing MapReduce jobs in Java
Combiners and partitioners
MapReduce optimisation techniques

Module 3Apache Hive

HiveQL syntax for big data queries
Partitioning and bucketing
ORC and Parquet file formats
Hive on Tez vs MapReduce

Module 4HBase & ZooKeeper

HBase table design and row key strategy
CRUD operations with HBase Shell and Java API
ZooKeeper coordination service
HBase integration with Hive

Module 5YARN & Ecosystem Tools

YARN resource management
Apache Sqoop for RDBMS ingestion
Apache Flume for log aggregation
Apache Oozie workflow scheduling

Prerequisites

Linux command-line skills
Java programming basics (for MapReduce labs)
SQL knowledge

Who Should Attend

Data engineers building big data pipelines
Database administrators moving to big data
Software developers working with large datasets

Get Started

Interested in Apache Hadoop?

Our training advisors will help you choose the right batch format, dates, and pricing for your team or individual goals.

Request a Callback View All Courses