Reinforcement Learning in Motion

Reinforcement Learning in Motion

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 5h 56m | 1.60 GB

Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! Developer, data scientist, and expert instructor Phil Tabor guides you from the basics all the way to programming your own constantly-learning AI agents. In this course, he’ll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you’ll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib.

Reinforcement systems learn by doing, and so will you in this hands-on course! You’ll build and train a variety of algorithms as you go, each with a specific purpose in mind. The rich and interesting examples include simulations that train a robot to escape a maze, help a mountain car get up a steep hill, and balance a pole on a sliding cart. You’ll even teach your agents how to navigate Windy Gridworld, a standard exercise for finding the optimal path even with special conditions!

With reinforcement learning, an AI agent learns from its environment, constantly responding to the feedback it gets. The agent optimizes its behavior to avoid negative consequences and enhance positive outcomes. The resulting algorithms are always looking for the most positive and efficient outcomes!

Importantly, with reinforcement learning you don’t need a mountain of data to get started. You just let your AI agent poke and prod its environment, which makes it much easier to take on novel research projects without well-defined training datasets.

Inside:

  • What is a reinforcement learning agent?
  • An introduction to the Open AI Gym
  • Identifying appropriate algorithms
  • Implementing RL algorithms using Numpy
  • Visualizing performance with Matplotlib

You’ll need to be familiar with Python and machine learning basics. Examples use Python libraries like NumPy and Matplotlib. You’ll also need some understanding of linear algebra and calculus, please see the equations in the Free Downloads section for examples.

Table of Contents

01 Course introduction
02 Getting Acquainted with Machine Learning
03 How Reinforcement Learning Fits In
04 Required software
05 Understanding the agent
06 Defining the environment
07 Designing the reward
08 How the agent learns
09 Choosing actions
10 Coding the environment
11 Finishing the maze-running robot problem
12 Introducing the multi-armed bandit problem
13 Action-value methods
14 Coding the multi-armed bandit test bed
15 Moving the goal posts – nonstationary problems
16 Optimistic initial values and upper confidence bound action selection
17 Wrapping up the explore-exploit dilemma
18 Introducing Markov decision processes and the frozen lake environment
19 Even robots have goals
20 Handling uncertainty with policies and value functions
21 Achieving mastery – Optimal policies and value functions
22 Skating off the frozen lake
23 Crash-landing on planet Gridworld
24 Let’s make a plan – Policy evaluation in Gridworld
25 The best laid plans – Policy improvement in the Gridworld
26 Hastening our escape with policy iteration
27 Creating a backup plan with value iteration
28 Wrapping up dynamic programming
29 The windy gridworld problem
30 Monte who
31 No substitute for action – Policy evaluation with Monte Carlo methods
32 Monte Carlo control and exploring starts
33 Monte Carlo control without exploring starts
34 Off-policy Monte Carlo methods
35 Return to the frozen lake and wrapping up Monte Carlo methods
36 The cart pole problem
37 TD(0) prediction
38 On-policy TD control – SARSA
39 Off-policy TD control – Q learning
40 Back to school with double learning
41 Wrapping up temporal difference learning
42 The continuous mountain car problem
43 Why approximation methods
44 Stochastic gradient descent – The intuition
45 Stochastic gradient descent – The mathematics
46 Approximate Monte Carlo predictions
47 Linear methods and tiling
48 TD(0) semi-gradient prediction
49 Episodic semi-gradient control – SARSA
50 Over the hill – wrapping up approximation methods and the mountain car problem
51 Course recap
52 The frontiers of reinforcement learning
53 What to do next