Preprocessing Data with NumPy

Preprocessing Data with NumPy

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 6.5 Hours | 2.43 GB

NumPy, ndarrays, Slicing, Random Generators, Importing and Saving Data, Statistics, Data Manipulation, Preprocessing

The problem

Most data analyst, data science, and coding courses miss a crucial practical step. They don’t teach you how to work with raw data, how to clean and preprocess it. This creates a sizeable gap between the skills you need on the job and the abilities you have acquired in training. Truth be told, real-world data is messy, so you need to know how to overcome this obstacle to become an independent data professional.

The bootcamps we have seen online, and even live classes neglect this aspect and show you how to work with ‘clean’ data. But this isn’t doing you a favor. In reality, it will set you back both when you are applying for jobs, and when you’re on the job.

The solution

Our goal is to provide you with complete preparation using the NumPy package. This course will turn you into capable data analyst with a fantastic understanding of one of the most prominent computing packages in the world. To take you there, we will cover the following topics extensively.

  • The ndarray class and why we use it
  • The type of data arrays usually contain
  • Slicing and squeezing datasets
  • Dimensions of arrays, and how to reduce them
  • Generating pseudo-random data
  • Importing data from external text files
  • Saving/Exporting data to external files
  • Computing the statistics of the dataset (max, min, mean, variance, etc.)
  • Data cleaning
  • Data preprocessing
  • Final practical example

Each of these subjects builds on the previous ones. And this is precisely what makes our curriculum so valuable. Everything is shown in the right order and we guarantee that you are not going to get lost along the way, as we have provided all necessary steps in video (not a single one skipped). In other words, we are not going to teach you how to concatenate datasets before you know how to index or slice them.

So, to prepare you for the long journey towards a data science position, we created a course that will show you all the tools for the job: The Preprocessing Data with NumPy course [MG1] .

We believe that this resource will significantly boost your chances of landing a job, as it will prepare you for practical tasks and concepts that are frequently included in interviews.

NumPy is Python’s fundamental package for scientific computing. It has established itself as the go-to tool when you need to compute mathematical and statical operations.

Why learn it?

A large portion of a data analyst’s work is dedicated to preprocessing datasets. Unquestionably, this involves tons of mathematical and statistical techniques that NumPy is renowned for. What’s more, the package introduces multi-dimensional array structures and provides a plethora of built-in functions and methods to use while working with them. In other words, NumPy can be described as a computationally stable state-of-the-art Python instrument that provides great flexibility and can take your analysis to the next level.

Some of the topics we will cover:

  • Fundamentals of NumP
  • Random Generators
  • Working with text files
  • Statistics with NumPy
  • Data preprocessing
  • Final practical example

What you’ll learn

  • Arrays.
  • The definition of a package/library.
  • Installing and Upgrading a package.
  • Navigating the documentation.
  • A history of NumPy.
  • The relationship between arrays and vectors.
  • Arrays vs Lists.
  • Indexing.
  • Assigning values to arrays.
  • Elementwise properties and operations.
  • Datatypes supported by ndarrays.
  • Broadcasting and type casting.
  • Running a function or method over a given axis.
  • Slicing, Stepwise Slicing, Conditional Slicing
  • Dimensionality reduction in arrays.
  • Generating arrays full of identical values.
  • Generating non-random sequences of data.
  • Generating random data with Random Generators.
  • Generating random samples from a random probability distribution.
  • Importing and exporting data with and from NumPy.
  • NPY and NPZ files.
  • Maximums and Minimums.
  • Percentiles and Quantiles.
  • Mean and Variance.
  • Covariance and Correlation.
  • Calculating histograms.
  • Higher dimension histograms.
  • Finding and filling up missing values.
  • Substituting “filler” values.
  • Reshaping arrays.
  • Removing parts of arrays.
  • Removing parts of individual elements within arrays. (Stripping)
  • Sorting and Shuffling.
  • Argument Functions.
  • Stacking and Concatenating.
  • Finding the unique values within an array.
  • A comprehensive practical example of data cleaning and preprocessing.
Table of Contents

Introduction to NumPy
1 What Does the Course Cover
2 Download All Resources
3 FAQ
4 The NumPy Package and Its Applications
5 Installing and Upgrading NumPy
6 What is an array
7 Using the NumPy Documentation
8 Introduction to NumPy – Exercise

Why Do We Use NumPy
9 A Brief History of NumPy
10 ndarrays
11 Arrays vs Lists
12 Why Do We Use NumPy – Exercise

NumPy Fundamentals
13 Indexing
14 Assigning Values
15 Elementwise Properties
16 NumPy Datatypes
17 Characteristics of NumPy Functions – Part 1
18 Characteristics of NumPy Functions – Part 2
19 NumPy Fundamentals – Exercise

Working with Arrays
20 Basic Slicing
21 Stepwise Slicing
22 Conditional Slicing
23 Dimensions and the Squeeze Function
24 Working with Arrays – Exercise

Generating Data with NumPy
25 Empty Arrays, Arrays of Identical Values
26 like Functions
27 A Sequence of Numbers – np.arange()
28 Random Generators and Seeds
29 Random Integers, Probabilities and Choices
30 Random Probability Distributions
31 Applications of Random Generators
32 Generating Data with NumPy – Exercise

Importing and Saving Data
33 Importing Data with Numpy – np.loadtxtx() vs np.genfromtxt()
34 Importing Data with NumPy – Simple Cleaning when Importing
35 Importing Data with NumPy – String vs Object vs Numbers
36 Importing Data with NumPy – Exercise
37 Saving Data with NumPy – NPY
38 Saving Data with NumPy – NPZ
39 Saving Data with NumPy – CSV
40 Importing and Saving Data – Exercise

Statistics with NumPy
41 Using NumPy Statistical Functions
42 Minimal and Maximal Values
43 Percentiles and Quantiles
44 Averages and Variance
45 Covariance and Correlation
46 Histogram – Part 1 1-D Histograms
47 Histogram – Part 2 Higher Dimension Histograms
48 N-A-N Equivalent Functions
49 Statistics with NumPy – Exercise

Manipulation Data with NumPy
50 Checking for Missing Values
51 Substituting Filler Values
52 Reshaping Arrays
53 Removing Values
54 Sorting Arrays
55 Argument Functions – Part 1 Argument Sort
56 Argument Functions – Part 1 Argument Where
57 Shuffling Data
58 Casting Arrays
59 Stripping Symbols from Arrays
60 Stacking Arrays
61 Concatenating Arrays
62 Finding Unique Values in Arrays

A NumPy Practical Example
63 Setting Up Introduction to the Practical Example
64 Setting Up Importing the Data Set
65 Setting Up Checking for Incomplete Data
66 Setting Up Splitting the Dataset
67 Setting Up Creating Checkpoints
68 Manipulating Text Data Issue Date
69 Manipulating Text Data Loan Status and Term
70 Manipulating Text Data Grade and Sub Grade
71 Manipulating Text Data Verification Status & URL
72 Manipulating Text Data State Address
73 Manipulating Text Data Converting Strings and Creating a Checkpoint
74 Manipulating Numeric Data Substitute Filler Values
75 Manipulating Numeric Data Currency Change – The Exchange Rate
76 Manipulating Numeric Data Currency Change – From USD to EUR
77 Completing the Dataset