Hands-On Big Data Processing with Hadoop 3

Hands-On Big Data Processing with Hadoop 3

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 4h 36m | 966 MB

Perform real-time data analytics, stream and batch processing on your application using Hadoop

Hadoop which is one of the best open-source software frameworks for distributed computing. It provides you with means to ramp up your career and skills. You will start out by learning the basics of Hadoop, including its file system HDFS, and its cluster management resource YARN and its many libraries and programming tools. This course will get you started with the Hadoop major components which Industry demands. You will be able to see how the structure, unstructured and semi structured data can be processed with Hadoop.

This course will majorly focus on the problem faced in Big Data and the solution offered by respective Hadoop component. You will learn to use different components and tools such as Mapreduce to process raw data and will learn how tools such as Hive and Pig aids in this process. You will then move on to Data Analysis techniques with Hadoop using tools such as Hive and will learn to apply them in a real world Big Data Application. This course will teach you to perform real-time data analytics, stream and batch processing on your application. Finally, this course will also teach you how to extend your analytics solutions to the cloud.

This hands-on course covers all the important aspects of Big Data Processing with Hadoop 3. With a great balance between theoretical and practical aspects of the course, you will get a complete understanding of the subject

What You Will Learn

  • The introduction to practical of Hadoop ecosystem and how to understand each component
  • Understanding of the Data storage and Data processing in Hadoop by UNix commands
  • Manage the HDFS storage and move the data
  • Import the data and deal with Structured data and query it through Hive
  • import the data from non RDBMs source and store in HDFS
  • Deal with semi structured data and Unstructured data through PIG
Table of Contents

What Is Hadoop
1 The Course Overview
2 Introduction to Hadoop
3 Introduction to Hadoop Distributed File System
4 HDFS Architecture and Features
5 Replication and Rack Awareness
6 Anatomy of a File ReadWrite on HDFS

Making Hadoop Efficient – YARN Architecture
7 The Rise of Resource Manager
8 YARN Architecture
9 How YARN Has Effectively Increased the Potential of Hadoop
10 Classic versus YARN
11 YARN Daemons
12 Containers
13 Speculative Execution
14 HDFS Federation
15 Authentication and High Availability
16 Understanding the Major Changes in Different Versions of Hadoop – 1.X , 2.X, And 3.X

Analyze Data with MapReduce Basics
17 What Is MapReduce
18 MapReduce Framework, Architecture, and Use Cases
19 Input Splits
20 Assigning Word Count with a .jar File

Analyzing Structured Data with Hadoop
21 Why We Need to Analyze Data with Hive
22 What Is Hive
23 Hive Architecture
24 Warehouse Directory and Metastore
25 Hive Query Language
26 Managed and External Tables

Efficient Data Transfer with Sqoop
27 How Are We Going to Learn
28 Importing Data from RDBMS to HDFS
29 Exporting Data from HDFS to RDBMS

Managing Data Collection and Transfer with Flume
30 What Is Flume
31 Flume Architecture
32 Preparing Flume for Fetching the Data
33 Fetch the Data from Twitter in HDFS

Perform Data Execution with Pig
34 Pig Background
35 Pig Architecture
36 Pig Latin Basics
37 Pig Execution Modes
38 Pig Processing – Loading and Transforming Data
39 Pig Built-in Functions
40 Filtering, Grouping, and Sorting Data
41 Relational Join Operators