Contents

About Big Data Hadoop online Training:

This 5 Week Training Course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop Cluster using the most up-to-date tools and techniques, including Apache Spark, Map Reduce, HDFS, Hive, Sqoop, and HBase.

Pre-Requisites:

Programming Language (Java,..).
RDBMS concepts(SQL).
Fundamentals of Linux(commands)

Hadoop online Course Content

Hadoop Introduction

Introduction to Hadoop and the Hadoop Ecosystem

Problems with Traditional Large-scale Systems
Hadoop!
The Hadoop EcoSystem
Hadoop Architecture and HDFS
Distributed Processing on a Cluster
Storage: HDFS Architecture
Storage: Using HDFS
Resource Management: YARN Architecture
Resource Management: Working with YARN

Importing Relational Data with Apache Sqoop

Sqoop Overview
Basic Imports and Exports
Limiting Results
Improving Sqoop’s Performance

Introduction to Hive

Introduction to Hive
Why Use Hive?
Comparing Hive to Traditional Databases
Hive Use Cases
Modeling and Managing Data with Hive
Data Storage Overview
Creating Databases and Tables
Loading Data into Tables

Apache Spark

Apache Spark is the next-generation successor to MapReduce. Spark is a powerful, open- source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs.

Parallel Programming with Spark

Review: Spark on a Cluster
RDD Partitions
Partitioning of File-based RDDs
HDFS and Data Locality
Executing Parallel Operations
Stages and Tasks

Spark Caching and Persistence

RDD Lineage
Caching Overview
Distributed Persistence

Common Patterns in Spark Data Processing

Common Spark Use Cases
Iterative Algorithms in Spark
Graph Processing and Analysis
Machine Learning
Example: k-means

Preview: Spark SQL Spark SQL and the SQL Context

Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Comparing Spark SQL with Impala

Apache HBase

Apache HBase is a distributed, scalable, NoSQL database built on Apache Hadoop. HBase can store data in massive tables consisting of billions of rows and millions of columns, serve data to many users and applications in real time, and provide fast, random read/write access to users and applications.

HBase Concepts

The use cases and usage occasions for HBase, Hadoop, and RDBMS

Using the HBase shell to directly manipulate HBase tables
Designing optimal HBase schemas for efficient data storage and recovery
How to connect to HBase using the Java API to insert and retrieve data in real time
Best practices for identifying and resolving performance bottlenecks

Big Data Hadoop Online Training in Chennai

About Big Data Hadoop online Training:

Pre-Requisites:

Hadoop online Course Content

Hadoop Introduction

Introduction to Hadoop and the Hadoop Ecosystem

Importing Relational Data with Apache Sqoop

Introduction to Hive

Apache Spark

Parallel Programming with Spark

Spark Caching and Persistence

Common Patterns in Spark Data Processing

Preview: Spark SQL Spark SQL and the SQL Context

Apache HBase

HBase Concepts

The use cases and usage occasions for HBase, Hadoop, and RDBMS

About Naresh i Technologies

Interview Questions

HYDERABAD MAIN CAMPUS

Join Our Telegram – get Updates

About Big Data Hadoop online Training:

Pre-Requisites:

Hadoop online Course Content

Hadoop Introduction

Introduction to Hadoop and the Hadoop Ecosystem

Importing Relational Data with Apache Sqoop

Introduction to Hive

Apache Spark

Parallel Programming with Spark

Spark Caching and Persistence

Common Patterns in Spark Data Processing

Preview: Spark SQL Spark SQL and the SQL Context

Apache HBase

HBase Concepts

The use cases and usage occasions for HBase, Hadoop, and RDBMS

About Naresh i Technologies

Interview Questions

HYDERABAD MAIN CAMPUS

Follow Us On Social Media

Join Our Telegram – get Updates