Exploratory Data Analysis with R (Video)

Exploratory Data Analysis with R (Video)

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 4h 43m | 1.83 GB

Harness the skills to analyze your data effectively with EDA and R

The greatest number of mistakes and failures in data analysis comes from not performing adequate Exploratory Data Analysis (EDA). Lack of EDA knowledge can expose you to the great risk of drawing incorrect, and potentially harmful, conclusions from your data analysis.

In this course, you will learn how EDA helps you draw conclusions to make better sense of your data and implement correct techniques. We’ll begin with a brief introduction to EDA, its importance, and advantages over BI tools. Using R libraries like dplyr and ggplot2, we will generate insights and formulate relevant questions for investigation and communicate the results effectively using visualizations. You will learn how to spot missing data and errors, validate assumptions, and identify the patterns for understanding the problem. Based on this, you’ll be able to select a correct ML model to use for your data.

By the end of the course, you will be able to quickly get know and interpret various kinds of data sets you will be presented with, and easily understand how to handle and work with them in order to make them ready for further modeling activities.

Learn

  • Set up your data and code to avoid mistakes and ensure reproducibility
  • Really understand the structure and content of your data
  • Build clear plots to evaluate the distribution of your data with ggplot
  • Construct summaries of your variables with dplyr
  • Implement data cleaning and validation tasks to get your data ready for data mining activities
  • Test a hypothesis or check assumptions related to a specific model
  • Estimate parameters and figure the margins of error
Table of Contents

Setting the Stage – How to Organize Your EDA Working Area
1 The Course Overview
2 Setting Up RStudio Project
3 Organizing Project Structure (To Ensure Reproducibility)
4 Coding Best Practice (Right Choices for Better Results)
5 Using Git to Avoid Messing Up You Analyses
6 Pro Tip – Tweak RStudio Project Options to Force Reproducibility

Investigate the Structure of Your Data
7 Import Your Data
8 Discover the Structure of Your Data with the str() and glimpse() Function
9 Have a Look at the Data with head(), tail(), and top n()
10 Pro Tips – The Importance of Variable Types

Don’t Step on the Eggs – Check the Quality of Your Data Before Using Them
11 Defining the Required Data Quality
12 Spotting Missing Values
13 Handling Missing Values
14 Discovering Incoherent Records
15 Spotting Outliers in Your Data
16 Handling Incoherent Records Using Smell Tests
17 Applying Censoring and Flooring to Your Data
18 Final Roundup – Data Quality Problems and Viable Remedial Actions

Summarizing Data and Investigating Distributions
19 Distribution and Summary Statistics
20 Summary Statistics for Categorical Variables
21 Distribution Visualization for Categorical Data
22 Summary Statistics for Numerical Variables
23 Distribution Visualization for Numerical Variables

Investigating Relationships and Patterns among Variables
24 Summary Statistics for Correlation
25 Visual Relationship among Data
26 Time Related Patterns in Data
27 Finding Structural Breaks in Data
28 Pro Tip – Correlation Does Not Imply Causation

Testing Model Assumptions
29 Need for Model Assumptions
30 Testing Sample Representativeness (The Most Common Assumption)
31 Applying Linear Regression
32 Applying Logistic Regression

Building Quick EDA Lean Report
33 Leveraging R Markdown Notebook
34 Defining the Key Message and Arranging the Report Consequently
35 Ensuring Data Lineage in Your Report
36 Share Your Report
37 Pro Tip – Using PaletteR to Quickly Build Colour Palettes