13 Model-based RL
work in progress
Model-free: The future is cached into values.
Two problems of model-free:
- Needs a lot of samples
- Cannot adapt to novel tasks in the same environment.
Model-based uses an internal model to reason about the future (imagination).
Works only when the model is fixed (AlphaGo) or easy to learn (symbolic, low-dimensional). Not robust yet against model imperfection.
13.1 Dyna-Q
13.2 Unsorted references
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images (Watter et al., 2015)
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation (Corneil et al., 2018)
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning (Feinberg et al., 2018)
Imagination-Augmented Agents for Deep Reinforcement Learning (Weber et al., 2017).
Temporal Difference Model TDM (Pong et al., 2018): http://bair.berkeley.edu/blog/2018/04/26/tdm/
Learning to Adapt: Meta-Learning for Model-Based Control, (Clavera et al., 2018)
The Predictron: End-To-End Learning and Planning (Silver et al., 2016)
Model-Based Planning with Discrete and Continuous Actions (Henaff et al., 2017)
Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics (Kansky et al., 2017)
Universal Planning Networks (Srinivas et al., 2018)
World models https://worldmodels.github.io/ (Ha and Schmidhuber, 2018)
Recall Traces: Backtracking Models for Efficient Reinforcement Learning (Goyal et al., 2018)
Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning (Peng et al., 2018)
Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning (Pardo et al., 2018)