Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., and Levine, S. (2016a). Q-
Prop:
Sample-Efficient Policy Gradient with
An Off-Policy Critic. Available at:
http://arxiv.org/abs/1611.02247.
Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016b). Continuous
Deep Q-Learning with
Model-based Acceleration. Available at:
http://arxiv.org/abs/1603.00748.
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Tassa, Y., and Erez, T. (2015). Learning continuous control policies by stochastic value gradients.
Proc. International Conference on Neural Information Processing Systems, 2944–2952. Available at:
http://dl.acm.org/citation.cfm?id=2969569.
Heinrich, J., Lanctot, M., and Silver, D. (2015). Fictitious
Self-Play in
Extensive-Form Games. 805–813. Available at:
http://proceedings.mlr.press/v37/heinrich15.html.
Heinrich, J., and Silver, D. (2016). Deep
Reinforcement Learning from
Self-Play in
Imperfect-Information Games. Available at:
http://arxiv.org/abs/1603.01121.