Advanced Reinforcement Learning in Python: from DQN to SAC

Advanced Reinforcement Learning in Python: from DQN to SAC

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 112 lectures (8h 5m) | 2.41 GB

Build Artificial Intelligence (AI) agents using Deep Reinforcement Learning and PyTorch: DDPG, TD3, SAC, NAF, HER.

This is the most complete Advanced Reinforcement Learning course on Udemy. In it, you will learn to implement some of the most powerful Deep Reinforcement Learning algorithms in Python using PyTorch and PyTorch lightning. You will implement from scratch adaptive algorithms that solve control tasks based on experience. You will learn to combine these techniques with Neural Networks and Deep Learning methods to create adaptive Artificial Intelligence agents capable of solving decision-making tasks.

This course will introduce you to the state of the art in Reinforcement Learning techniques. It will also prepare you for the next courses in this series, where we will explore other advanced methods that excel in other types of task.

The course is focused on developing practical skills. Therefore, after learning the most important concepts of each family of methods, we will implement one or more of their algorithms in jupyter notebooks, from scratch.

Leveling modules:

  • Refresher: The Markov decision process (MDP).
  • Refresher: Q-Learning.
  • Refresher: Brief introduction to Neural Networks.
  • Refresher: Deep Q-Learning.
  • Refresher: Policy gradient methods

Advanced Reinforcement Learning:

  • PyTorch Lightning.
  • Hyperparameter tuning with Optuna.
  • Deep Q-Learning for continuous action spaces (Normalized advantage function NAF).
  • Deep Deterministic Policy Gradient (DDPG).
  • Twin Delayed DDPG (TD3).
  • Soft Actor-Critic (SAC).
  • Hindsight Experience Replay (HER).

What you’ll learn

  • Master some of the most advanced Reinforcement Learning algorithms.
  • Learn how to create AIs that can act in a complex environment to achieve their goals.
  • Create from scratch advanced Reinforcement Learning agents using Python’s most popular tools (PyTorch Lightning, OpenAI gym, Brax, Optuna)
  • Learn how to perform hyperparameter tuning (Choosing the best experimental conditions for our AI to learn)
  • Fundamentally understand the learning process for each algorithm.
  • Debug and extend the algorithms presented.
  • Understand and implement new algorithms from research papers.
Table of Contents

Introduction
1 Introduction
2 Reinforcement Learning series
3 Google Colab
4 Where to begin

Refresher The Markov Decision Process (MDP)
5 Module Overview
6 Elements common to all control tasks
7 The Markov decision process (MDP)
8 Types of Markov decision process
9 Trajectory vs episode
10 Reward vs Return
11 Discount factor
12 Policy
13 State values v(s) and action values q(s,a)
14 Bellman equations
15 Solving a Markov decision process

Refresher Q-Learning
16 Module overview
17 Temporal difference methods
18 Solving control tasks with temporal difference methods
19 Q-Learning
20 Advantages of temporal difference methods

Refresher Brief introduction to Neural Networks
21 Module overview
22 Function approximators
23 Artificial Neural Networks
24 Artificial Neurons
25 How to represent a Neural Network
26 Stochastic Gradient Descent
27 Neural Network optimization

Refresher Deep Q-Learning
28 Module overview
29 Deep Q-Learning
30 Experience Replay
31 Target Network

PyTorch Lightning
32 PyTorch Lightning
33 Link to the code notebook
34 Introduction to PyTorch Lightning
35 Create the Deep Q-Network
36 Create the policy
37 Create the replay buffer
38 Create the environment
39 Define the class for the Deep Q-Learning algorithm
40 Define the play episode() function
41 Prepare the data loader and the optimizer
42 Define the train step() method
43 Define the train epoch end() method
44 [Important] Lecture correction
45 Train the Deep Q-Learning algorithm
46 Explore the resulting agent

Hyperparameter tuning with Optuna
47 Hyperparameter tuning with Optuna
48 Link to the code notebook
49 Log average return
50 Define the objective function
51 Create and launch the hyperparameter tuning job
52 Explore the best trial

Deep Q-Learning for continuous action spaces (Normalized Advantage Function)
53 Continuous action spaces
54 The advantage function
55 Normalized Advantage Function (NAF)
56 Normalized Advantage Function pseudocode
57 Link to the code notebook
58 Hyperbolic tangent
59 Creating the (NAF) Deep Q-Network 1
60 Creating the (NAF) Deep Q-Network 2
61 Creating the (NAF) Deep Q-Network 3
62 Creating the (NAF) Deep Q-Network 4
63 Creating the policy
64 Create the environment
65 Polyak averaging
66 Implementing Polyak averaging
67 Create the (NAF) Deep Q-Learning algorithm
68 Implement the training step
69 Implement the end-of-epoch logic
70 Debugging and launching the algorithm
71 Checking the resulting agent

Refresher Policy gradient methods
72 Policy gradient methods
73 Policy performance
74 Representing policies using neural networks
75 The policy gradient theorem
76 Entropy Regularization

Deep Deterministic Policy Gradient (DDPG)
77 The Brax Physics engine
78 Deep Deterministic Policy Gradient (DDPG)
79 DDPG pseudocode
80 Link to the code notebook
81 Deep Deterministic Policy Gradient (DDPG)
82 Create the gradient policy
83 Create the Deep Q-Network
84 Create the DDPG class
85 Define the play method
86 Setup the optimizers and dataloader
87 Define the training step
88 Launch the training process
89 Check the resulting agent

Twin Delayed DDPG (TD3)
90 Twin Delayed DDPG (TD3)
91 TD3 pseudocode
92 Link to code notebook
93 Twin Delayed DDPG (TD3)
94 Clipped double Q-Learning
95 Delayed policy updates
96 Target policy smoothing
97 Check the resulting agent

Soft Actor-Critic (SAC)
98 Soft Actor-Critic (SAC)
99 SAC pseudocode
100 Create the robotics task
101 Create the Deep Q-Network
102 Create the gradient policy
103 Implement the Soft Actor-Critic algorithm – Part 1
104 Implement the Soft Actor-Critic algorithm – Part 2
105 Check the results

Hindsight Experience Replay
106 Hindsight Experience Replay (HER)
107 Implement Hindsight Experience Replay (HER) – Part 1
108 Implement Hindsight Experience Replay (HER) – Part 2
109 Implement Hindsight Experience Replay (HER) – Part 3
110 Check the results

Final steps
111 Next steps
112 Next steps

Homepage