Apache Spark Essential Training: Big Data Engineering

Apache Spark Essential Training: Big Data Engineering

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 1h 02m | 182 MB

Data engineering is the foundation for building analytics and data science applications in the new Big Data world. Data engineering requires combining multiple big data technologies to construct data pipelines and networks to stream, process, and store data. This course focuses on building full-fledged solutions that combine Apache Spark with other Big Data tools to create end-to-end data pipelines. Instructor Kumaran Ponnambalam begins by defining data engineering, its functions, and its concepts. Next, Kumaran goes over how Spark capabilities such as parallel processing, execution plans, state management options, and machine learning work with extract, transform, load (ETL). He introduces you to batch processing use cases and processes, as well as real-time processing pipelines. After walking you through several useful best practices, Kumaran concludes with an end-to-end exercise project.

Table of Contents

Introduction
1 Driving big data engineering with Apache Spark
2 Course prerequisites
3 Setting up the exercise files

1. Data Engineering Concepts
4 What is data engineering
5 Data engineering vs. data analytics vs. data science
6 Data engineering functions
7 Batch vs. real-time processing
8 Data engineering with Spark

2. Spark Capabilities for ETL
9 Spark architecture review
10 Parallel processing with Spark
11 Spark execution plan
12 Stateful stream processing
13 Spark analytics and ML

3. Batch Processing Pipelines
14 Batch processing use case Problem statement
15 Batch processing use case Design
16 Setting up the local DB
17 Uploading stock to a central store
18 Aggregating stock across warehouses

4. Real-Time Processing Pipelines
19 Real-time use case Problem
20 Real-time use case Design
21 Generating a visits data stream
22 Building a website analytics job
23 Executing the real-time pipeline

5. Data Engineering with Spark Best Practices
24 Batch vs. real-time options
25 Scaling extraction and loading operations
26 Scaling processing operations
27 Building resiliency

6. End-to-End Exercise Project
28 Project exercise requirements
29 Solution design
30 Extracting long last actions
31 Building a scorecard

Conclusion
32 More about Apache Spark

Homepage