Big Data Processing with Apache Spark

Big Data Processing with Apache Spark

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 3h 30m | 664 MB

Efficiently tackle large data sets and big data analysis challenges using Spark and Python

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. Big Data Processing with Apache Spark teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You’ll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

You’ll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you’ll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.

By the end of this course, you’ll not only have understood how to use machine learning extensions and structured streams but you’ll also be able to apply Spark in your own upcoming big data projects.

What You Will Learn

  • Write your own Python programs that can interact with Spark
  • Implement data stream consumption using Apache Spark
  • Recognize common operations in Spark to process known data streams
  • Integrate Spark streaming with Amazon Web Services
  • Create a collaborative filtering model with Python and the movielens dataset
  • Apply processed data streams to Spark machine learning APIs
Table of Contents

01 Course Overview
02 Installation and Setup
03 Lesson Overview
04 Introduction to Spark and Resilient Distributed Datasets
05 Operations Supported by the RDD API
06 Map Reduce Operations
07 Self-Contained Python Spark Programs
08 Nested Functions and Standalone Python Programs
09 Introduction to SQL, Datasets, and DataFrames
10 Lesson Summary
11 Lesson Overview
12 Introduction to Streaming Architectures
13 Introduction to Discretized Streams (Dstreams)
14 Operations Supported by the Spark Streaming API
15 Windowing Operations
16 Structured Streaming
17 Lesson Summary
18 Lesson Overview
19 Spark Integration with AWS Services
20 Integrating AWS Kinesis and Python
21 AWS S3 Basic Functionality
22 Kinesis Streams and Spark Streams
23 Lesson Summary
24 Lesson Overview
25 Spark Integration with Machine Learning
26 Spark Streaming Windowing Operations
27 Lesson Summary