You will find below links to download the notebooks for the exercises (which you have to fill) and their solution (which you can look at after you finished the exercise). It is recommended not to look at the solution while doing the exercise unless you are lost.

Alternatively, you can run the notebooks directly on Colab (https://colab.research.google.com/) if you have a Google account.

You will also find videos presenting the exercises and commenting their solution.

The solution of each exercise is rendered in the following pages.

Introduction to Python

This exercise is an introduction to Python for absolute beginners. If you already know Python, you can safely skip it.

Presentation

Commented solution

Numpy and Matplotlib

The goal of this exercise is to present the basics of the numerical library numpy as well as the visualization library matplotlib.

Presentation

Commented solution

Sampling

The goal of this exercise is to understand how to sample rewards from a n-armed bandits and to understand the central limit theorem.

Presentation

Commented solution

Bandits - part 1

The goal of this exercise is to implement simple action selection mechanisms for the n-armed bandit:

• Greedy action selection
• \epsilon-greedy action selection
• Softmax action selection

Presentation

Commented solution

Bandits - part 2

The goal of this exercise is to further investigate the properties of the action selection algorithms for the n-armed bandit.

Presentation

Commented solution

Dynamic programming

The goal of this exercise is to apply policy iteration and value iteration on the recycling robot MDP.

Presentation

Commented solution

Gym

The goal of this exercise is to install gym and learn how to use its interface.

Presentation

Commented solution

Monte Carlo control

The goal of this exercise is to implement on-policy Monte-Carlo control on the Taxi gym environment.

Presentation

Commented solution

Temporal difference

The goal of this exercise is to implement Q-learning on the Taxi gym environment.

Presentation

Commented solution

Eligibility traces

The goal of this exercise is to implement Q-learning with eligibility traces on the Gridworld environment.

Presentation

Commented solution

Keras

The goal of this exercise is to quickly discover keras and to understand why neural networks (and SGD) need i.i.d. samples.

Presentation

Commented solution

DQN

The goal of this exercise is to implement DQN on the Cartpole balancing problem.