Meta learning
Deep Reinforcement Learning
Preface
Introduction
Tabular RL
Sampling and Bandits
Markov Decision Process
Monte Carlo methods
Temporal Difference learning
Value-based deep RL
Function approximation
Deep learning
Deep Q-network (DQN)
DQN variants (Rainbow)
Distributed learning
Misc.
Policy-gradient methods
Policy Gradient methods
Advantage Actor-Critic (A3C)
Off-policy Actor-Critic
Deep Deterministic Policy Gradient (DDPG)
Natural gradients
Policy optimization (TRPO, PPO)
Actor-Critic with Experience Replay (ACER)
Maximum Entropy RL (SAC)
Misc.
Model-based deep RL
Model-based RL
Model-based-augmented model-free RL (Dyna-Q, I2A)
Planning (MPC, TDM)
World models, Dreamer
AlphaGo
Misc.
Advanced topics
Intrinsic motivation
Inverse Reinforcement Learning
Offline RL
Meta learning
Hierarchical Reinforcement Learning
References
Meta learning
work in progress
Offline RL
Hierarchical Reinforcement Learning