Clean Data: Tips, Tricks, and Techniques

Clean Data: Tips, Tricks, and Techniques

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 1h 31m | 260 MB

Use Python to check your data consistency and get rid of any missing or duplicate data

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe”? Do you apply the same principle when doing Data Science?

Effective data cleaning is one of the most important aspects of good Data Science and involves acquiring raw data and preparing it for analysis, which, if not done effectively, will not give you the accuracy or results that you’re looking to achieve, no matter how good your algorithm is.
Data Cleaning is the hardest part of big data and ML. To address this matter, this course will equip you with all the skills you need to clean your data in Python, using tried and tested techniques. You’ll find a plethora of tips and tricks that will help you get the job done, in a smart, easy, and efficient way.

Each section teaches one particular aspect of the overall topic and its section title reflects that. Each video teaches a subtopic in a hands-on way with a practical demonstration, along with explanation and a discussion of how it works and how to use it.

What You Will Learn

  • Learn to spot outliers in your data and analyze sensor data to find omissions.
  • Tokenize data and clean stop words to make it more robust.
  • Analyze and extract features from unstructured text data.
  • Clean and handle duplicates in your big data analytics and statistics.
  • Find and remove global row duplicates.
  • Learn to handle data cleaning for numbers.
Table of Contents

Identifying the Most Important Data Issues
1 The Course Overview
2 Setting Up the Work Environment
3 Finding Outliers in the Input Data
4 Reconcile Missing Values to Give Data More Meaning
5 Implementing and Testing the IQR Method

Cleaning Text Data
6 Tokenizing Input Data
7 Cleaning Stop Words
8 Removing Data-Specific Words That Has a Negative Impact
9 Handling White Spaces and Language-Agnostic Phrases

Dealing with Unstructured Data (Text)
10 Analyzing Unstructured Text Input Data
11 Extracting Features from Data and Transforming Text into Vector
12 Bag-Of-Words
13 Reducing Noise in Data by Using Skip-Gram

Duplicates
14 Analyzing Rows – Finding Duplicate Columns
15 Finding Global Row Duplicates
16 Handling Duplicates by Implementing Idempotent Processing
17 Duplicates That Has Meaning

Reasoning about Types and Default
18 Interpreting Not a Number – Cleaning for Numeric Data
19 Replacing NaN with Scalar Data
20 Backward Fill and Forward Fill
21 Replacing Generic Values