# List of exercises

You will find below links to download the notebooks for the exercises (which you have to fill) and their solution (which you can look at after you finished the exercise). It is recommended not to look at the solution while doing the exercise unless you are lost.

Alternatively, you can run the notebooks directly on Colab (https://colab.research.google.com/) if you have a Google account.

You will also find videos presenting the exercises and commenting their solution.

The solution of each exercise is rendered in the following pages.

## Introduction to Python

This exercise is an introduction to Python for absolute beginners. If you already know Python, you can safely skip it.

## Numpy and Matplotlib

The goal of this exercise is to present the basics of the numerical library numpy as well as the visualization library matplotlib.

## Sampling

The goal of this exercise is to understand how to sample rewards from a n-armed bandits and to understand the central limit theorem.

## Bandits - part 1

The goal of this exercise is to implement simple action selection mechanisms for the n-armed bandit:

• Greedy action selection
• \epsilon-greedy action selection
• Softmax action selection

## Bandits - part 2

The goal of this exercise is to further investigate the properties of the action selection algorithms for the n-armed bandit.

## Dynamic programming

The goal of this exercise is to apply policy iteration and value iteration on the recycling robot MDP.

## Gym

The goal of this exercise is to install gym and learn how to use its interface.

## Monte Carlo control

The goal of this exercise is to implement on-policy Monte-Carlo control on the Taxi gym environment.

## Temporal difference

The goal of this exercise is to implement Q-learning on the Taxi gym environment.

## Eligibility traces

The goal of this exercise is to implement Q-learning with eligibility traces on the Gridworld environment.

## Keras

The goal of this exercise is to quickly discover keras and to understand why neural networks (and SGD) need i.i.d. samples.

## DQN

The goal of this exercise is to implement DQN on the Cartpole balancing problem.

The goal of this exercise is to use the tianshou library to implement PPO on the Cartpole balancing problem.