Testing Python Data Science Code

Testing Python Data Science Code

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 0h 53m | 187 MB

The larger and more complex the world of data science becomes, the more data there is to collect, sort, clean, model on, and much more. An emerging pain point in this brave new world is that a lot can go wrong if your data engineering and development practices are shoddy. This advanced-level course shows data scientists, Python developers, and data analysts how to test scientific (data science) code written in Python. Veteran data science trainer and consultant Miki Tebeka covers testing techniques, with a focus on issues specific to data science code, such as floating point errors, statistical testing, working with large datasets, choosing a baseline, and more. After presenting a testing overview, Miki dives into testing with pytest and hypothesis. He explains how to use schemas, truth values, approximate testing, and more in data validation. Miki goes over regression testing, then demonstrates how to test Jupyter Notebooks.

Table of Contents

Introduction
1 Testing scientific applications
2 What you should know
3 Setting up

Testing Overview
4 Why test
5 Types of tests
6 Challenges in testing scientific applications
7 Continuous integration overview

pytest
8 pytest overview
9 Selecting tests
10 Parametrized tests
11 Fixtures
12 Mocking
13 Challenge Test with pytest
14 Solution Test with pytest

hypothesis
15 Overview of hypothesis
16 Testing with hypothesis
17 NumPy utilities
18 pandas utilities
19 Writing strategies
20 Challenge Test with hypothesis
21 Solution Test with hypothesis

Data Validation
22 Using schemas
23 Truth values
24 Floating point wonders
25 Approximate testing
26 Dealing with randomness
27 Comparing pandas DataFrames
28 Challenge Testing numerical code
29 Solution Testing numerical code

Regression Testing
30 Regression testing overview
31 Selecting regression data
32 Choosing quality metrics and baseline
33 Quality regression testing
34 Choosing speed and memory metrics
35 Performance regression testing

Testing Jupyter Notebooks
36 Testing Notebooks overview
37 Using nbconvert
38 Refactoring code
39 Other test libraries

Conclusion
40 Next steps

Homepage