13 Model-based RL

work in progress

Model-free: The future is cached into values.

Two problems of model-free:

Needs a lot of samples
Cannot adapt to novel tasks in the same environment.

Model-based uses an internal model to reason about the future (imagination).

Works only when the model is fixed (AlphaGo) or easy to learn (symbolic, low-dimensional). Not robust yet against model imperfection.

13.1 Dyna-Q

(Sutton and Barto, 1990)

https://medium.com/@ranko.mosic/online-planning-agent-dyna-q-algorithm-and-dyna-maze-example-sutton-and-barto-2016-7ad84a6dc52b

13.2 Unsorted references

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images (Watter et al., 2015)

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation (Corneil et al., 2018)

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning (Feinberg et al., 2018)

Imagination-Augmented Agents for Deep Reinforcement Learning (Weber et al., 2017).

Temporal Difference Model TDM (Pong et al., 2018): http://bair.berkeley.edu/blog/2018/04/26/tdm/

Learning to Adapt: Meta-Learning for Model-Based Control, (Clavera et al., 2018)

The Predictron: End-To-End Learning and Planning (Silver et al., 2016)

Model-Based Planning with Discrete and Continuous Actions (Henaff et al., 2017)

Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics (Kansky et al., 2017)

Universal Planning Networks (Srinivas et al., 2018)

World models https://worldmodels.github.io/ (Ha and Schmidhuber, 2018)

Recall Traces: Backtracking Models for Efficient Reinforcement Learning (Goyal et al., 2018)

Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning (Peng et al., 2018)

Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning (Pardo et al., 2018)

Clavera, I., Nagabandi, A., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. (2018). Learning to Adapt: Meta-Learning for Model-Based Control. Available at: http://arxiv.org/abs/1803.11347.

Corneil, D., Gerstner, W., and Brea, J. (2018). Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation. Available at: http://arxiv.org/abs/1802.04325.

Feinberg, V., Wan, A., Stoica, I., Jordan, M. I., Gonzalez, J. E., and Levine, S. (2018). Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning. Available at: http://arxiv.org/abs/1803.00101.

Goyal, A., Brakel, P., Fedus, W., Lillicrap, T., Levine, S., Larochelle, H., et al. (2018). Recall Traces: Backtracking Models for Efficient Reinforcement Learning. Available at: http://arxiv.org/abs/1804.00379.

Ha, D., and Schmidhuber, J. (2018). World Models. doi:10.5281/zenodo.1207631.

Henaff, M., Whitney, W. F., and LeCun, Y. (2017). Model-Based Planning with Discrete and Continuous Actions. Available at: http://arxiv.org/abs/1705.07177.

Kansky, K., Silver, T., Mély, D. A., Eldawy, M., Lázaro-Gredilla, M., Lou, X., et al. (2017). Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics. Available at: http://arxiv.org/abs/1706.04317 [Accessed January 10, 2019].

Pardo, F., Levdik, V., and Kormushev, P. (2018). Q-map: A Convolutional Approach for Goal-Oriented Reinforcement Learning. Available at: http://arxiv.org/abs/1810.02927.

Peng, B., Li, X., Gao, J., Liu, J., Wong, K.-F., and Su, S.-Y. (2018). Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. Available at: http://arxiv.org/abs/1801.06176.

Pong, V., Gu, S., Dalal, M., and Levine, S. (2018). Temporal Difference Models: Model-Free Deep RL for Model-Based Control. Available at: http://arxiv.org/abs/1802.09081.

Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., et al. (2016). The Predictron: End-To-End Learning and Planning. Available at: http://arxiv.org/abs/1612.08810.

Srinivas, A., Jabri, A., Abbeel, P., Levine, S., and Finn, C. (2018). Universal Planning Networks. Available at: http://arxiv.org/abs/1804.00645.

Sutton, R. S., and Barto, A. G. (1990). “Time-derivative models of Pavlovian reinforcement,” in Learning and Computational Neuroscience: Foundations of Adaptive Networks (MIT Press), 497–537. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.98.

Watter, M., Springenberg, J. T., Boedecker, J., and Riedmiller, M. (2015). Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. Available at: https://arxiv.org/pdf/1506.07365.pdf.

Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez, A., Rezende, D. J., et al. (2017). Imagination-Augmented Agents for Deep Reinforcement Learning. Available at: http://arxiv.org/abs/1707.06203.