Data Science in Python: Unsupervised Learning

Data Science in Python: Unsupervised Learning

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 202 lectures (16h 46m) | 5.90 GB

Learn Python for Data Science & Machine Learning, and build unsupervised learning models with fun, hands-on projects!

This is a hands-on, project-based course designed to help you master the foundations for unsupervised learning in Python.

We’ll start by reviewing the data science workflow, discussing the techniques & applications of unsupervised learning, and walking through the data prep steps required for modeling. You’ll learn how to set the correct row granularity for modeling, apply feature engineering techniques, select relevant features, and scale your data using normalization and standardization.

From there we’ll fit, tune, and interpret 3 popular clustering models using scikit-learn. We’ll start with K-Means Clustering, learn to interpret the output’s cluster centers, and use inertia plots to select the right number of clusters. Next, we’ll cover Hierarchical Clustering, where we’ll use dendrograms to identify clusters and cluster maps to interpret them. Finally, we’ll use DBSCAN to detect clusters and noise points and evaluate the models using their silhouette score.

We’ll also use DBSCAN and Isolation Forests for anomaly detection, a common application of unsupervised learning models for identifying outliers and anomalous patterns. You’ll learn to tune and interpret the results of each model and visualize the anomalies using pair plots.

Next, we’ll introduce the concept of dimensionality reduction, discuss its benefits for data science, and explore the stages in the data science workflow in which it can be applied. We’ll then cover two popular techniques: Principal Component Analysis, which is great for both feature extraction and data visualization, and t-SNE, which is ideal for data visualization.

Last but not least, we’ll introduce recommendation engines, and you’ll practice creating both content-based and collaborative filtering recommenders using techniques such as Cosine Similarity and Singular Value Decomposition.

Throughout the course you’ll play the role of an Associate Data Scientist for the HR Analytics team at a software company trying to increase employee retention. Using the skills you learn throughout the course, you’ll use Python to segment the employees, visualize the clusters, and recommend next steps to increase retention.

COURSE OUTLINE:

Intro to Data Science

Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow

Unsupervised Learning 101

Review the basics of unsupervised learning, including key concepts, types of techniques and applications, and its place in the data science workflow

Pre-Modeling Data Prep

Recap the data prep steps required to apply unsupervised learning models, including restructuring data, engineering & scaling features, and more

Clustering

Apply three different clustering techniques in Python and learn to interpret their results using metrics, visualizations, and domain expertise

Anomaly Detection

Understand where anomaly detection fits in the data science workflow, and apply techniques like Isolation Forests and DBSCAN in Python

Dimensionality Reduction

Use techniques like Principal Component Analysis (PCA) and t-SNE in Python to reduce the number of features in a data set without losing information

Recommenders

Recognize the variety of approaches for creating recommenders, then apply unsupervised learning techniques in Python, including Cosine Similarity and Singular Vector Decomposition (SVD)

What you’ll learn

  • Master the foundations of unsupervised Machine Learning in Python, including clustering, anomaly detection, dimensionality reduction, and recommenders
  • Prepare data for modeling by applying feature engineering, selection, and scaling
  • Fit, tune, and interpret three types of clustering algorithms: K-Means Clustering, Hierarchical Clustering, and DBSCAN
  • Use unsupervised learning techniques like Isolation Forests and DBSCAN for anomaly detection
  • Apply and interpret two types of dimensionality reduction models: Principal Component Analysis (PCA) and t-SNE
  • Build recommendation engines using content-based and collaborative filtering techniques, including Cosine Similarity and Singular Value Decomposition (SVD)
Table of Contents

Getting Started
1 Course Introduction
2 About This Series
3 Course Structure & Outline
4 READ ME Important Notes for New Students
5 DOWNLOAD Course Resources
6 Introducing the Course Project
7 Setting Expectations
8 Jupyter Installation & Launch

Intro to Data Science
9 Section Introduction
10 What is Data Science
11 Data Science Skill Set
12 What is Machine Learning
13 Common Machine Learning Algorithms
14 Data Science Workflow
15 Step 1 Scoping a Project
16 Step 2 Gathering Data
17 Step 3 Cleaning Data
18 Step 4 Exploring Data
19 Step 5 Modeling Data
20 Step 6 Sharing Insights
21 Unsupervised Learning
22 Key Takeaways
23 Intro to Data Science

Unsupervised Learning 101
24 Section Introduction
25 Unsupervised Learning 101
26 Unsupervised Learning Techniques
27 Unsupervised Learning Applications
28 Structure of This Course
29 Unsupervised Learning Workflow
30 Key Takeaways
31 Unsupervised Learning 101

Pre-Modeling Data Prep
32 Section Introduction
33 Data Prep for Unsupervised Learning
34 Setting the Correct Row Granularity
35 DEMO Group By
36 DEMO Pivot
37 ASSIGNMENT Setting the Correct Row Granularity
38 SOLUTION Setting the Correct Row Granularity
39 Preparing Columns for Modeling
40 Identifying Missing Data
41 Handling Missing Data
42 Converting to Numeric
43 Converting to DateTime
44 Extracting DateTime
45 Calculating Based on a Condition
46 Dummy Variables
47 ASSIGNMENT Preparing Columns for Modeling
48 SOLUTION Preparing Columns for Modeling
49 Feature Engineering
50 Feature Engineering During Data Prep
51 Applying Calculations
52 Binning Values
53 Identifying Proxy Variables
54 Feature Engineering Tips
55 ASSIGNMENT Feature Engineering
56 SOLUTION Feature Engineering
57 Excluding Identifiers From Modeling
58 Feature Selection
59 ASSIGNMENT Feature Selection
60 SOLUTION Feature Selection
61 Feature Scaling
62 Normalization
63 Standardization
64 ASSIGNMENT Feature Scaling
65 SOLUTION Feature Scaling
66 Key Takeaways
67 Pre-Modeling Data Prep

Clustering
68 Section Introduction
69 Clustering Basics
70 K-Means Clustering
71 K-Means Clustering in Python
72 DEMO K-Means Clustering in Python
73 Visualizing K-Means Clustering
74 Interpreting K-Means Clustering
75 Visualizing Cluster Centers
76 ASSIGNMENT K-Means Clustering
77 SOLUTION K-Means Clustering
78 Inertia
79 Plotting Inertia in Python
80 DEMO Plotting Inertia in Python
81 ASSIGNMENT Inertia Plot
82 SOLUTION Inertia Plot
83 Tuning a K-Means Model
84 DEMO Tuning a K-Means Model
85 ASSIGNMENT Tuning a K-Means Model
86 SOLUTION Tuning a K-Means Model
87 Selecting the Best Model
88 DEMO Selecting the Best Model
89 ASSIGNMENT Selecting the Best K-Means Model
90 SOLUTION Selecting the Best K-Means Model
91 Hierarchical Clustering
92 Dendrograms in Python
93 Agglomerative Clustering in Python
94 DEMO Agglomerative Clustering in Python
95 Cluster Maps in Python
96 DEMO Cluster Maps in Python
97 ASSIGNMENT Hierarchical Clustering
98 SOLUTION Hierarchical Clustering
99 DBSCAN
100 DBSCAN in Python
101 Silhouette Score
102 Silhouette Score in Python
103 DEMO DBSCAN and Silhouette Score in Python
104 ASSIGNMENT DBSCAN
105 SOLUTION DBSCAN
106 Comparing Clustering Algorithms
107 Clustering Next Steps
108 DEMO Compare Clustering Models
109 DEMO Label Unseen Data
110 Key Takeaways
111 Clustering

PROJECT Clustering Clients
112 Project Overview
113 SOLUTION Data Prep
114 SOLUTION K-Means Clustering
115 SOLUTION Hierarchical Clustering
116 SOLUTION DBSCAN
117 SOLUTION Compare, Recommend and Predict

Anomaly Detection
118 Section Introduction
119 Anomaly Detection Basics
120 Anomaly Detection Approaches
121 Anomaly Detection Workflow
122 Isolation Forests
123 Isolation Forests in Python
124 Visualizing Anomalies
125 Tuning and Interpreting Isolation Forests
126 ASSIGNMENT Isolation Forests
127 SOLUTION Isolation Forests
128 DBSCAN for Anomaly Detection
129 DBSCAN for Anomaly Detection in Python
130 Visualizing DBSCAN Anomalies
131 ASSIGNMENT DBSCAN for Anomaly Detection
132 SOLUTION DBSCAN for Anomaly Detection
133 Comparing Anomaly Detection Algorithms
134 RECAP Clustering and Anomaly Detection
135 Key Takeaways
136 Anomaly Detection

Dimensionality Reduction
137 Section Introduction
138 Dimensionality Reduction Basics
139 Why Reduce Dimensions
140 Dimensionality Reduction Workflow
141 Principal Component Analysis
142 Principal Component Analysis in Python
143 Explained Variance Ratio
144 DEMO PCA and Explained Variance Ratio in Python
145 ASSIGNMENT Principal Component Analysis
146 SOLUTION Principal Component Analysis
147 Interpreting PCA
148 DEMO Interpreting PCA
149 ASSIGNMENT Interpreting PCA
150 SOLUTION Interpreting PCA
151 Feature Selection vs Feature Extraction
152 PCA Next Steps
153 T-SNE
154 T-SNE in Python
155 ASSIGNMENT T-SNE
156 SOLUTION T-SNE
157 PCA vs t-SNE
158 DEMO Dimensionality Reduction and Clustering
159 ASSIGNMENT T-SNE & K-Means Clustering
160 SOLUTION T-SNE & K-Means Clustering
161 Key Takeaways
162 Dimensionality Reduction

Recommenders
163 Section Introduction
164 Recommenders Basics
165 Content-Based Filtering
166 Cosine Similarity
167 Cosine Similarity in Python
168 Making a Content Based Filtering Recommendation
169 ASSIGNMENT Content-Based Filtering
170 SOLUTION Content-Based Filtering
171 Collaborative Filtering
172 User-Item Matrix
173 ASSIGNMENT User-Item Matrix
174 SOLUTION User-Item Matrix
175 Singular Value Decomposition
176 Singular Value Decomposition in Python
177 ASSIGNMENT Singular Value Decomposition
178 SOLUTION Singular Value Decomposition
179 Choosing the Number of Components
180 DEMO Choosing the Number of Components
181 ASSIGNMENT Choosing the Number of Components
182 SOLUTION Choosing the Number of Components
183 Making a Collaborative Filtering Recommendation
184 DEMO Making a Collaborative Filtering Recommendation
185 ASSIGNMENT Collaborative Filtering
186 SOLUTION Collaborative Filtering
187 Recommender Next Steps
188 DEMO Hybrid Approach
189 Key Takeaways
190 Recommenders

PROJECT Recommending Restaurants
191 Project Overview
192 SOLUTION Data Prep
193 SOLUTION TruncatedSVD
194 SOLUTION Cosine Similarity
195 SOLUTION Recommendations

Unsupervised Learning Review
196 Section Introduction
197 Unsupervised Learning Flow Chart
198 Unsupervised Learning Techniques & Applications
199 Unsupervised Learning in the Data Science Workflow
200 Key Takeaways

Final Project
201 Final Project Overview
202 SOLUTION Data Prep & EDA
203 SOLUTION Clustering
204 SOLUTION PCA
205 SOLUTION Clustering (Round 2)
206 SOLUTION PCA (Round 2)
207 SOLUTION EDA on Clusters
208 SOLUTION Recommendations

Next Steps
209 BONUS LESSON

Homepage