Artificial Intelligence: Reinforcement Learning in Python

Artificial Intelligence: Reinforcement Learning in Python

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 14.5 Hours | 2.92 GB

Complete guide to Reinforcement Learning, with Stock Trading and Online Advertising Applications

When people talk about artificial intelligence, they usually don’t mean supervised and unsupervised machine learning.

These tasks are pretty trivial compared to what we think of AIs doing – playing chess and Go, driving cars, and beating video games at a superhuman level.

Reinforcement learning has recently become popular for doing all of that and more.

Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn’t been until recently that we’ve been able to observe first hand the amazing results that are possible.

In 2016 we saw Google’s AlphaGo beat the world Champion in Go.

We saw AIs playing video games like Doom and Super Mario.

Self-driving cars have started driving on real roads with other drivers and even carrying passengers (Uber), all without human assistance.

If that sounds amazing, brace yourself for the future because the law of accelerating returns dictates that this progress is only going to continue to increase exponentially.

Learning about supervised and unsupervised machine learning is no small feat. To date I have over TWENTY FIVE (25!) courses just on those topics alone.

And yet reinforcement learning opens up a whole new world. As you’ll learn in this course, the reinforcement learning paradigm is very from both supervised and unsupervised learning.

It’s led to new and amazing insights both in behavioral psychology and neuroscience. As you’ll learn in this course, there are many analogous processes when it comes to teaching an agent and teaching an animal or even a human. It’s the closest thing we have so far to a true artificial general intelligence. What’s covered in this course?

  • The multi-armed bandit problem and the explore-exploit dilemma
  • Ways to calculate means and moving averages and their relationship to stochastic gradient descent
  • Markov Decision Processes (MDPs)
  • Dynamic Programming
  • Monte Carlo
  • Temporal Difference (TD) Learning (Q-Learning and SARSA)
  • Approximation Methods (i.e. how to plug in a deep neural network or other differentiable model into your RL algorithm)
  • How to use OpenAI Gym, with zero code changes
  • Project: Apply Q-Learning to build a stock trading bot

If you’re ready to take on a brand new challenge, and learn about AI techniques that you’ve never seen before in traditional supervised machine learning, unsupervised machine learning, or even deep learning, then this course is for you.

What you’ll learn

  • Apply gradient-based supervised machine learning methods to reinforcement learning
  • Understand reinforcement learning on a technical level
  • Understand the relationship between reinforcement learning and psychology
  • Implement 17 different reinforcement learning algorithms
Table of Contents

Welcome
1 Introduction
2 Course Outline and Big Picture
3 Where to get the Code
4 How to Succeed in this Course
5 Warmup

Return of the Multi-Armed Bandit
6 Section Introduction The Explore-Exploit Dilemma
7 Applications of the Explore-Exploit Dilemma
8 Epsilon-Greedy Theory
9 Calculating a Sample Mean (pt 1)
10 Epsilon-Greedy Beginner’s Exercise Prompt
11 Designing Your Bandit Program
12 Epsilon-Greedy in Code
13 Comparing Different Epsilons
14 Optimistic Initial Values Theory
15 Optimistic Initial Values Beginner’s Exercise Prompt
16 Optimistic Initial Values Code
17 UCB1 Theory
18 UCB1 Beginner’s Exercise Prompt
19 UCB1 Code
20 Bayesian Bandits Thompson Sampling Theory (pt 1)
21 Bayesian Bandits Thompson Sampling Theory (pt 2)
22 Thompson Sampling Beginner’s Exercise Prompt
23 Thompson Sampling Code
24 Thompson Sampling With Gaussian Reward Theory
25 Thompson Sampling With Gaussian Reward Code
26 Why don’t we just use a library
27 Nonstationary Bandits
28 Bandit Summary, Real Data, and Online Learning
29 (Optional) Alternative Bandit Designs
30 Suggestion Box

High Level Overview of Reinforcement Learning
31 What is Reinforcement Learning
32 From Bandits to Full Reinforcement Learning

Markov Decision Proccesses
33 MDP Section Introduction
34 Gridworld
35 Choosing Rewards
36 The Markov Property
37 Markov Decision Processes (MDPs)
38 Future Rewards
39 Value Functions
40 The Bellman Equation (pt 1)
41 The Bellman Equation (pt 2)
42 The Bellman Equation (pt 3)
43 Bellman Examples
44 Optimal Policy and Optimal Value Function (pt 1)
45 Optimal Policy and Optimal Value Function (pt 2)
46 MDP Summary

Dynamic Programming
47 Dynamic Programming Section Introduction
48 Iterative Policy Evaluation
49 Designing Your RL Program
50 Gridworld in Code
51 Iterative Policy Evaluation in Code
52 Windy Gridworld in Code
53 Iterative Policy Evaluation for Windy Gridworld in Code
54 Policy Improvement
55 Policy Iteration
56 Policy Iteration in Code
57 Policy Iteration in Windy Gridworld
58 Value Iteration
59 Value Iteration in Code
60 Dynamic Programming Summary

Monte Carlo
61 Monte Carlo Intro
62 Monte Carlo Policy Evaluation
63 Monte Carlo Policy Evaluation in Code
64 Monte Carlo Control
65 Monte Carlo Control in Code
66 Monte Carlo Control without Exploring Starts
67 Monte Carlo Control without Exploring Starts in Code
68 Monte Carlo Summary

Temporal Difference Learning
69 Temporal Difference Introduction
70 TD(0) Prediction
71 TD(0) Prediction in Code
72 SARSA
73 SARSA in Code
74 Q Learning
75 Q Learning in Code
76 TD Learning Section Summary

Approximation Methods
77 Approximation Methods Section Introduction
78 Linear Models for Reinforcement Learning
79 Feature Engineering
80 Approximation Methods for Prediction
81 Approximation Methods for Prediction Code
82 Approximation Methods for Control
83 Approximation Methods for Control Code
84 CartPole
85 CartPole Code
86 Approximation Methods Exercise
87 Approximation Methods Section Summary

Interlude Common Beginner Questions
88 This Course vs. RL Book What’s the Difference

Stock Trading Project with Reinforcement Learning
89 Beginners, halt! Stop here if you skipped ahead
90 Stock Trading Project Section Introduction
91 Data and Environment
92 How to Model Q for Q-Learning
93 Design of the Program
94 Code pt 1
95 Code pt 2
96 Code pt 3
97 Code pt 4
98 Stock Trading Project Discussion

Setting Up Your Environment (FAQ by Student Request)
99 Windows-Focused Environment Setup 2018
100 How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow

Extra Help With Python Coding for Beginners (FAQ by Student Request)
101 How to Code by Yourself (part 1)
102 How to Code by Yourself (part 2)
103 Proof that using Jupyter Notebook is the same as not using it
104 Python 2 vs Python 3

Effective Learning Strategies for Machine Learning (FAQ by Student Request)
105 How to Succeed in this Course (Long Version)
106 Is this for Beginners or Experts Academic or Practical Fast or slow-paced
107 Machine Learning and AI Prerequisite Roadmap (pt 1)
108 Machine Learning and AI Prerequisite Roadmap (pt 2)

Appendix FAQ Finale
109 What is the Appendix
110 BONUS Where to get discount coupons and FREE deep learning material

Homepage