Mastering Big Data Analytics with PySpark: A comprehensive guide to performing efficient Advanced Analytics with PySpark

Mastering Big Data Analytics with PySpark: A comprehensive guide to performing efficient Advanced Analytics with PySpark

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 8h 07m | 1.64 GB

Effectively apply Advanced Analytics to large datasets using the power of PySpark

PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and pipelines. This course starts by introducing you to PySpark’s potential for performing effective analyses of large datasets. You’ll learn how to interact with Spark from Python and connect Jupyter to Spark to provide rich data visualizations. After that, you’ll delve into various Spark components and its architecture.

You’ll learn to work with Apache Spark and perform ML tasks more smoothly than before. Gathering and querying data using Spark SQL, to overcome challenges involved in reading it. You’ll use the DataFrame API to operate with Spark MLlib and learn about the Pipeline API. Finally, we provide tips and tricks for deploying your code and performance tuning.

By the end of this course, you will not only be able to perform efficient data analytics but will have also learned to use PySpark to easily analyze large datasets at-scale in your organization.

Learn

  • Gain a solid knowledge of vital Data Analytics concepts via practical use cases
  • Create elegant data visualizations using Jupyter
  • Run, process, and analyze large chunks of datasets using PySpark
  • Utilize Spark SQL to easily load big data into DataFrames
  • Create fast and scalable Machine Learning applications using MLlib with Spark
  • Perform exploratory Data Analysis in a scalable way
  • Achieve scalable, high-throughput and fault-tolerant processing of data streams using Spark Streaming
Table of Contents

Python and Spark – A Match Made in Heaven
1 Course Overview
2 Python versus Spark
3 Preparing for the Course
4 Connecting Jupyter to Spark

Working with PySpark
5 Getting to Know Spark
6 The Power of Spark
7 The Power of Spark MLlib
8 Spark DataFrames
9 Spark Data Operations

Preparing Data Using Spark SQL
10 Loading Data from CSV Files
11 Fixing Issues in Our Data тАУ Part One
12 Fixing Issues in Our Data тАУ Part Two
13 Grouping, Joining, and Aggregating тАУ Part One
14 Grouping, Joining, and Aggregating тАУ Part Two

Machine Learning with Spark MLlib
15 Machine Learning with Spark
16 Building a Recommendation System with Spark MLlib тАУ Part One
17 Building a Recommendation System with Spark MLlib тАУ Part Two
18 Building a Recommendation System with Spark MLlib тАУ Part Three
19 Finalizing our Recommendation System
20 What We Have Learned So Far

Classification and Regression
21 Machine Learning with Spark
22 Machine Learning Pipelines
23 Running a Logistic Regression Pipeline
24 Parameters, Features, and Persistence
25 Frequent Pattern Mining and Statistics

Analyzing Big Data
26 Natural Language Processing with Spark
27 Identifying Our Data
28 Data Preparation and Exploration
29 Creating Our Raw Training Data

Processing Natural Language in Spark
30 Data Preparation and Regular Expressions
31 Data Cleaning and Transformation
32 Training a Sentiment Analysis Model тАУ Part One
33 Training a Sentiment Analysis Model тАУ Part Two

Machine Learning in Real-Time
34 Fetching Data from Twitter
35 Spark Structured Streaming
36 Managing and Converting Streams
37 Assembling Our Streaming ML Solution
38 A Structured Approach to ML Streaming

The Power of PySpark
39 Running Spark in Production
40 Running Spark at Scale
41 Tips, Tricks, and Take-Aways