Big Data
Apache Hadoop
Process and manage petabyte-scale data with the Hadoop ecosystem.
4 DaysIntermediateTensorNova Certificate of Completion
Course Curriculum
Module 1HDFS Architecture
- NameNode, DataNode, and block replication
- HDFS commands and file operations
- Data locality principle
- HDFS high availability with ZooKeeper
Module 2MapReduce
- Map and Reduce paradigm
- Writing MapReduce jobs in Java
- Combiners and partitioners
- MapReduce optimisation techniques
Module 3Apache Hive
- HiveQL syntax for big data queries
- Partitioning and bucketing
- ORC and Parquet file formats
- Hive on Tez vs MapReduce
Module 4HBase & ZooKeeper
- HBase table design and row key strategy
- CRUD operations with HBase Shell and Java API
- ZooKeeper coordination service
- HBase integration with Hive
Module 5YARN & Ecosystem Tools
- YARN resource management
- Apache Sqoop for RDBMS ingestion
- Apache Flume for log aggregation
- Apache Oozie workflow scheduling
Prerequisites
- Linux command-line skills
- Java programming basics (for MapReduce labs)
- SQL knowledge
Who Should Attend
- Data engineers building big data pipelines
- Database administrators moving to big data
- Software developers working with large datasets
Get Started
Interested in Apache Hadoop?
Our training advisors will help you choose the right batch format, dates, and pricing for your team or individual goals.