The Data Analyst Course: Complete Data Analyst Bootcamp 2020

The Data Analyst Course: Complete Data Analyst Bootcamp 2020

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 19.5 Hours | 7.95 GB

Complete Data Analyst Training: Python, NumPy, Pandas, Data Collection, Preprocessing, Data Types, Data Visualization

The problem

Most data analyst, data science, and coding courses miss a critical practical step. They don’t teach you how to work with raw data, how to clean, and preprocess it. This creates a sizeable gap between the skills you need on the job and the abilities you have acquired in training. Truth be told, real-world data is messy, so you need to know how to overcome this obstacle to become an independent data professional.

The bootcamps we have seen online and even live classes neglect this aspect and show you how to work with ‘clean’ data. But this isn’t doing you a favour. In reality, it will set you back both when you are applying for jobs, and when you’re on the job.

The solution

Our goal is to provide you with complete preparation. And this course will turn you into a job-ready data analyst. To take you there, we will cover the following fundamental topics extensively.

  • Theory about the field of data analytics
  • Basic Python
  • Advanced Python
  • NumPy
  • Pandas
  • Working with text files
  • Data collection
  • Data cleaning
  • Data preprocessing
  • Data visualization
  • Final practical example

Each of these subjects builds on the previous ones. And this is precisely what makes our curriculum so valuable. Everything is shown in the right order and we guarantee that you are not going to get lost along the way, as we have provided all necessary steps in video (not a single one skipped). In other words, we are not going to teach you how to analyse data before you know how to gather and clean it.

So, to prepare you for the entry-level job that leads to a data science position – data analyst – we created The Data Analyst Course.

This is a rather unique training program because it teaches the fundamentals you need on the job. A frequently neglected aspect of vital importance.

Moreover, our focus is to teach topics that flow smoothly and complement each other. The course provides complete preparation for someone who wants to become a data analyst at a fraction of the cost of traditional programs (not to mention the amount of time you will save). We believe that this resource will significantly boost your chances of landing a job, as it will prepare you for practical tasks and concepts that are frequently included in interviews.

The topics we will cover

1. Theory about the field of data analytics
2. Basic Python
3. Advanced Python
4. NumPy
5. Pandas
6. Working with text files
7. Data collection
8. Data cleaning
9. Data preprocessing
10. Data visualization
11. Final practical example

1. Theory about the field of data analytics

Here we will focus on the big picture. But don’t imagine long boring pages with terms you’ll have to check up in a dictionary every minute. Instead, this is where we want to define who a data analyst is, what they do, and how they create value for an organization.

Why learn it?

You need a general understanding to appreciate how every part of the course fits in with the rest of the content. As they say, if you know where you are going, chances are that you will eventually get there. And since data analyst and other data jobs are relatively new and constantly evolving, we want to provide you with a good grasp of the data analyst role specifically. Then, in the following chapters, we will teach you the actual tools you need to become a data analyst.

2. Basic Python

This course is centred around Python. So, we’ll start from the very basics. Don’t be afraid if you do not have prior programming experience.

Why learn it?

You need to learn a programming language to take full advantage of the data-rich world we live in. Unless you are equipped with such a skill, you will always be dependent on other people’s ability to extract and manipulate data, and you want to be independent while doing analysis, right? Also, you don’t necessarily need to learn many programming languages at once. It is enough to be very skilled at just one, and we’ve naturally chosen Python which has established itself as the number one language for data analysis and data science (thanks to its rich libraries and versatility).

3. Advanced Python

We will introduce advanced Python topics such as working with text data and using tools such as list comprehensions and anonymous functions.

Why learn it?

These lessons will turn you into a proficient Python user who is independent on the job. You will be able to use Python’s core strengths to your advantage. So, here it is not just about the topics, it is also about the depth in which we explore the most relevant Python tools.

4. NumPy

NumPy is Python’s fundamental package for scientific computing. It has established itself as the go-to tool when you need to compute mathematical and statical operations.

Why learn it?

A large portion of a data analyst’s work is dedicated to preprocessing datasets. Unquestionably, this involves tons of mathematical and statistical techniques that NumPy is renowned for. In addition, the package introduces multi-dimensional array structures and provides a plethora of built-in functions and methods to use while working with them. In other words, NumPy can be described as a computationally stable state-of-the-art Python instrument that provides flexibility and can take your analysis to the next level.

5. Pandas

The pandas library is one of the most popular Python tools that facilitate data manipulation and analysis. It is very valuable because you can use it to manipulate all sorts of information – numerical tables and time series data, as well as text.

Why learn it?

Pandas is the other main tool an analyst needs to clean and preprocess the data they are working with. Its data manipulation features are second to none in Python because of the diversity and richness it provides in terms of methods and functions. The combined ability to work with both NumPy and pandas is extremely powerful as the two libraries complement each other. You need to be capable to operate with both to produce a complete and consistent analysis independently.

6. Working with text files

Exchanging information with text files is practically how we exchange information today. In this part of the course, we will use the Python, pandas, and NumPy tools learned earlier to give you the essentials you need when importing or saving data.

Why learn it?

In many courses, you are just given a dataset to practice your analytical and programming skills. However, we don’t want to close our eyes to reality, where converting a raw dataset from an external file into a workable Python format can be a massive challenge.

7. Data collection

In the real world, you don’t always have the data readily available for you. In this part of the course, you will learn how to retrieve data from an API.

Why learn it?

You need to know how to source your data, right? To be a well-rounded analyst you must be able to collect data from outside sources. This is rarely a one-click process. This section aims at providing you with all the necessary tools to do that on your own.

8. Data cleaning

The next logical step is to clean your data. This is where you will apply the pandas skills acquired earlier in practice. All lessons throughout the course have a real-world perspective.

Why learn it?

A large part of a data analyst’s job in the real world involves cleaning data and preparing it for the actual analysis. You can’t expect that you’ll deal with flawless data sources, right? So, it will be up to you to overcome this stage and clean your data.

9. Data preprocessing

Even when your dataset is clean and in an understandable shape, it isn’t quite ready to be processed for visualizations and analysis just yet. There is a crucial step in between, and that’s data preprocessing.

Why learn it?

Data preprocessing is where a data analyst can demonstrate how good or great they are at their job. This stage of the work requires the ability to choose the right statistical tool that will improve the quality of your dataset and the knowledge to implement it with advanced pandas and NumPy techniques. Only when you’ve completed this step can you say that your dataset is preprocessed and ready for the next part, which is data visualization.

10. Data visualization

Data visualization is the face of data. Many people look at the data and see nothing. The reason for that is that they are not creating good visualizations. Or even worse – they are creating nice graphs but cannot interpret them accurately.

Why learn it?

This part of the course will teach you how to use your data to produce meaningful insights. At the end of the day, data charts are what conveys the most information in the shortest amount of time. And nothing speaks better than a well crafted and meaningful data visualization.

11. Practical example

The course contains plenty of exercises and practical cases. In the end, we have included a comprehensive practical example that will show you how everything you have learned along the way comes nicely together. This is where you will be able to appreciate how far you have come in your journey to becoming a data analyst and starting your data career.

What you’ll learn

  • The course provides the complete preparation you need to become a data analyst
  • Fill up your resume with in-demand data skills: Python programming, NumPy, pandas, data preparation – data collection, data cleaning, data preprocessing, data visualization; data analysis, data analytics
  • Acquire a big picture understanding of the data analyst role
  • Learn beginner and advanced Python
  • Study mathematics for Python
  • We will teach you NumPy and pandas, basics and advanced
  • Be able to work with text files
  • Understand different data types and their memory usage
  • Learn how to obtain interesting, real-time information from an API with a simple script
  • Clean data with pandas Series and DataFrames
  • Complete a data cleaning exercise on absenteeism rate
  • Expand your knowledge of NumPy – statistics and preprocessing
  • Go through a complete loan data case study and apply your NumPy skills
  • Master data visualization
  • Learn how to create pie, bar, line, area, histogram, scatter, regression, and combo charts
  • Engage with coding exercises that will prepare you for the job
  • Practice with real-world data
  • Solve a final capstone project
Table of Contents

Introduction to the Course
1 A Practical Example – What Will You Learn in This Course
2 What Does the Course Cover
3 Download All Resources
4 FAQ

Introduction to Data Analytics
5 Introduction to the World of Business and Data
6 Relevant Terms Explained
7 Data Analyst Compared to Other Data Jobs
8 Data Analyst Job Description
9 Why Python

Setting up the Environment
10 Introduction
11 Programming Explained in a Few Minutes
12 Jupyter – Introduction
13 Jupyter – Installing Anaconda
14 Jupyter – Intro to Using Jupyter
15 Jupyter – Working with Notebook Files
16 Jupyter – Using Shortcuts
17 Jupyter – Handling Error Messages
18 Jupyter – Restarting the Kernel

Python Basics
19 Python Variables
20 Types of Data – Numbers and Boolean Values
21 Types of Data – Strings
22 Basic Python Syntax – Arithmetic Operators
23 Basic Python Syntax – The Double Equality Sign
24 Basic Python Syntax – Reassign Values
25 Basic Python Syntax – Add Comments
26 Basic Python Syntax – Line Continuation
27 Basic Python Syntax – Indexing Elements
28 Basic Python Syntax – Indentation
29 Operators – Comparison Operators
30 Operators – Logical and Identity Operators
31 Conditional Statements – The IF Statement
32 Conditional Statements – The ELSE Statement
33 Conditional Statements – The ELIF Statement
34 Conditional Statements – A Note on Boolean Values
35 Functions – Defining a Function in Python
36 Functions – Creating a Function with a Parameter
37 Functions – Another Way to Define a Function
38 Functions – Using a Function in Another Function
39 Functions – Combining Conditional Statements and Functions
40 Functions – Creating Functions That Contain a Few Arguments
41 Functions – Notable Built-in Functions in Python
42 Sequences – Lists
43 Sequences – Using Methods
44 Sequences – List Slicing
45 Sequences – Tuples
46 Sequences – Dictionaries
47 Iteration – For Loops
48 Iteration – While Loops and Incrementing
49 Iteration – Create Lists with the range() Function
50 Iteration – Use Conditional Statements and Loops Together
51 Iteration – Conditional Statements, Functions, and Loops
52 Iteration – Iterating over Dictionaries

Fundamentals for Coding in Python
53 Object-Oriented Programming (OOP)
54 Modules, Packages, and the Python Standard Library
55 Importing Modules
56 Introduction to Using NumPy and pandas
57 What is Software Documentation
58 The Python Documentation

Mathematics for Python
59 What Is а Matrix
60 Scalars and Vectors
61 Linear Algebra and Geometry
62 Arrays in Python
63 What Is a Tensor
64 Adding and Subtracting Matrices
65 Errors When Adding Matrices
66 Transpose
67 Dot Product of Vectors
68 Dot Product of Matrices
69 Why is Linear Algebra Useful

NumPy Basics
70 The NumPy Package and Why We Use It
71 InstallingUpgrading NumPy
72 Ndarray
73 The NumPy Documentation
74 NumPy Basics – Exercise

Pandas – Basics
75 Introduction to the pandas Library
76 Installing and Running pandas
77 Introduction to pandas Series
78 Working with Attributes in Python
79 Using an Index in pandas
80 Label-based vs Position-based Indexing
81 More on Working with Indices in Python
82 Using Methods in Python – Part I
83 Using Methods in Python – Part II
84 Parameters vs Arguments
85 the pandas Documentation
86 Introduction to pandas DataFrames
87 Creating DataFrames from Scratch – Part I
88 Creating DataFrames from Scratch – Part II
89 Additional Notes on Using DataFrames
90 pandas Basics – Conclusion

Working with Text Files
91 Working with Files in Python – An Introduction
92 File vs File Object, Read vs Parse
93 Structured vs Semi-Structured and Unstructured Data
94 Data Connectivity through Text Files
95 Principles of Importing Data in Python
96 More on Text Files (.txt vs .csv)
97 Fixed-width Files
98 Common Naming Conventions Used in Programming
99 Importing Text Files in Python ( open() )
100 Importing Text Files in Python ( with open() )
101 Importing .csv Files with pandas – Part I
102 Importing .csv Files with pandas – Part II
103 Importing .csv Files with pandas – Part III
104 Importing Data with the index col Parameter
105 Importing Data with NumPy – .loadtxt() vs genfromtxt()
106 Importing Data with NumPy – Partial Cleaning While Importing
107 Importing Data with NumPy – Exercise
108 Importing .json Files
109 Prelude to Working with Excel Files in Python
110 Working with Excel Data (the .xlsx Format)
111 An Important Exercise on Importing Data in Python
112 Importing Data with the pandas’ Squeeze Parameter
113 A Note on Importing Files in Jupyter
114 Saving Your Data with pandas
115 Saving Your Data with NumPy – np.save()
116 Saving Your Data with NumPy – np.savez()
117 Saving Your Data with NumPy – np.savetxt()
118 Saving Your Data with NumPy – Exercise
119 Working with Text Files – Conclusion

Working with Text Data
120 Using the .format() Method

Must-Know Python Tools
121 Iterating Over Range Objects
122 Nested For Loops – Introduction
123 Triple Nested For Loops
124 List Comprehensions
125 Anonymous (Lambda) Functions

Data GatheringData Collection
126 What is data gatheringdata collection

APIs (POST requests are not needed for this course)
127 Overview of APIs
128 GET and POST Requests
129 Data Exchange Format for APIs JSON
130 Introducing the Exchange Rates API
131 Including Parameters in a GET Request
132 More Functionalities of the Exchange Rates API
133 Coding a Simple Currency Conversion Calculator
134 iTunes API
135 iTunes API Homework
136 iTunes API Structuring and Exporting the Data
137 Pagination GitHub API
138 APIs Exercise

Data Cleaning and Data Preprocessing
139 Data Cleaning and Data Preprocessing

pandas Series
140 unique(), .nunique()
141 Converting Series into Arrays
142 sort values()
143 Attribute and Method Chaining
144 sort index()

NumPy Fundamentals
145 Indexing in NumPy
146 Assigning Values in NumPy
147 Elementwise Properties of Arrays
148 Types of Data Supported by NumPy
149 Characteristics of NumPy Functions Part 1
150 Characteristics of NumPy Functions Part 2
151 NumPy Fundamentals – Exercise

NumPy DataTypes
152 ndarrays
153 Arrays vs Lists
154 Strings vs Object vs Number
155 NumPy DataTypes – Exercise

Working with Arrays
156 Basic Slicing in NumPy
157 Stepwise Slicing in NumPy
158 Conditional Slicing in NumPy
159 Dimensions and the Squeeze Function
160 Working with Arrays – Exercise

Generating Data with NumPy
161 Arrays of 0s and 1s
162 like functions in NumPy
163 A Non-Random Sequence of Numbers
164 Random Generators and Seeds
165 Basic Random Functions in NumPy
166 Probability Distributions in NumPy
167 Applications of Random Data in NumPy
168 Generating Data with NumPy – Exercise

Statistics with NumPy
169 Using Statistical Functions in NumPy
170 Minimal and Maximal Values in NumPy
171 Statistical Order Functions in NumPy
172 Averages and Variance in NumPy
173 Covariance and Correlation in NumPy
174 Histograms in NumPy (Part 1)
175 Histograms in NumPy (Part 2)
176 NAN Equivalent Functions in NumPy
177 Statistics with NumPy – Exercise

NumPy – Preprocessing
178 Checking for Missing Values in Ndarrays
179 Substituting Missing Values in Ndarrays
180 Reshaping Ndarrays
181 Removing Values from Ndarrays
182 Sorting Ndarrays
183 Argument Sort in NumPy
184 Argument Where in NumPy
185 Shuffling Ndarrays
186 Casting Ndarrays
187 Striping Values from Ndarrays
188 Stacking Ndarrays
189 Concatenating Ndarrays
190 Finding Unique Values in Ndarrays

A Loan Data Example with NumPy
191 Setting Up Introduction to the Practical Example
192 Setting Up Importing the Data Set
193 Setting Up Checking for Incomplete Data
194 Setting Up Splitting the Dataset
195 Setting Up Creating Checkpoints
196 Manipulating Text Data Issue Date
197 Manipulating Text Data Loan Status and Term
198 Manipulating Text Data Grade and Sub Grade
199 Manipulating Text Data Verification Status & URL
200 Manipulating Text Data State Address
201 Manipulating Text Data Converting Strings and Creating a Checkpoint
202 Manipulating Numeric Data Substitute Filler Values
203 Manipulating Numeric Data Currency Change – The Exchange Rate
204 Manipulating Numeric Data Currency Change – From USD to EUR
205 Completing the Dataset

The Absenteeism Exercise – Introduction
206 An Introduction to the Absenteeism Exercise
207 The Absenteeism Exercise from a Business Perspective
208 The Dataset

Solution to the Absenteeism Exercise
209 How to Complete the Absenteeism Exercise
210 Eyeball Your Data First
211 Note Programming vs the Rest of the World
212 Using a Statistical Approach to Solve Our Exercise
213 Dropping the ‘ID’ Column
214 Analysis of the ‘Reason for Absence’ Column
215 Splitting the Reasons for Absence into Multiple Dummy Variables
216 Working with Dummy Variables – A Statistical Perspective
217 Grouping the Reason for Absence Columns
218 Concatenating Columns in a pandas DataFrame
219 Reordering Columns in a DataFrame
220 Working on the ‘Date’ Column
221 Extracting the Month Value from the ‘Date’ Column
222 Creating the ‘Day of the Week’ Column
223 Understanding the Meaning of 5 More Columns
224 Modifying the ‘Education’ Column
225 Final Remarks on the Absenteeism Exercise

Data Visualization
226 What Is Data Visualization and Why Is It Important
227 Why Learn Data Visualization
228 Choosing the Right Visualization – What Are Some Popular Approaches and Framewor
229 Introduction into Colors and Color Theory
230 Bar Chart – Introduction – General Theory and Getting to Know the Dataset
231 Bar Chart – How to Create a Bar Chart Using Python
232 Bar Chart – Interpreting the Bar Graph. How to Make a Good Bar Graph
233 Pie Chart – Introduction – General Theory and Dataset
234 Pie Chart – How to Create a Pie Chart Using Python
235 Pie Chart – Interpreting the Pie Chart
236 Pie Chart – Why You Should Never Create a Pie Graph
237 Stacked Area Chart – Introduction – General Theory. Getting to Know the Dataset
238 Stacked Area Chart – How to Create a Stacked Area Chart Using Python
239 Stacked Area Chart – Interpreting the Stacked Area Graph
240 Stacked Area Chart – How to Make a Good Stacked Area Chart
241 Line Chart – Introduction – General Theory. Getting to Know the Dataset
242 Line Chart – How to Create a Line Chart in Python
243 Line Chart – Interpretation
244 Line Chart – How to Make a Good Line Chart
245 Histogram – Introduction – General Theory. Getting to Know the Dataset
246 Histogram – How to Create a Histogram Using Python
247 Histogram – Interpreting the Histogram
248 Histogram – Choosing the Number of Bins in a Histogram
249 Histogram – How to Make a Good Histogram
250 Scatter Plot – Introduction – General Theory. Getting to Know the Dataset
251 Scatter Plot – How to Create a Scatter Plot Using Python
252 Scatter Plot – Interpreting the Scatter Plot
253 Scatter Plot – How to Make a Good Scatter Plot
254 Regression Plot – Introduction – General Theory. Getting to Know the Dataset
255 Regression Plot – How to Create a Regression Scatter Plot Using Python
256 Regression Plot – Interpreting the Regression Scatter Plot
257 Regression Plot – How to Make a Good Regression Plot
258 Bar and Line Chart – Introduction – General Theory. Getting to Know the Dataset
259 Bar and Line Chart – How to Create a Combination Bar and Line Graph Using Python
260 Bar and Line Chart – Interpreting the Combination Bar and Line Graph
261 Bar and Line Chart – How to Make a Good Bar and Line Graph
262 Data Visualization – Exercise

Conclusion
263 Conclusion