3.1. Linear regression¶
The goal of this exercise is to implement the least mean squares algorithm (LMS) for linear regression seen in the course.
We start by importing numpy and matplotlib.
import numpy as np
import matplotlib.pyplot as plt
3.1.1. Least mean squares¶
To generate the data for the exercise, we will use the scikit-learn
library https://scikit-learn.org. It provides a huge selection of already implemented machine learning algorithms for classification, regression or clustering.
If you use Anaconda or Colab, scikit-learn
should already be installed. Otherwise, install it with pip
(you may need to restart this notebook afterwards):
pip install scikit-learn
We will use the method sklearn.datasets.make_regression
to generate the data. The documentation of this method is available at https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html.
The following cell imports the method:
from sklearn.datasets import make_regression
We can now generate the data. We start with the simplest case where the inputs have only one dimension. We will generate 100 samples\((x_i, t_i)\) linked by a linear relationship and some noise.
The following code generates the data:
N = 100
X, t = make_regression(n_samples=N, n_features=1, noise=15.0)
n_samples
is the number of samples generates, n_features
is the number of input variables and noise
quantifies how the points deviate from the linear relationship.
Q: Print the shape of the arrays X
and t
to better understand what is generated. Visualize the dataset using matplotlib (plt.scatter
). Vary the value of the noise
argument in the previous cell and visualize the data again.
Now is the time to implement the LMS algorithm with numpy.
Remember the LMS algorithm from the course:
\(w=0 \quad;\quad b=0\)
for M epochs:
\(dw=0 \quad;\quad db=0\)
for each sample \((x_i, t_i)\):
\(y_i = w \, x_i + b\)
\(dw = dw + (t_i - y_i) \, x_i\)
\(db = db + (t_i - y_i)\)
\(\Delta w = \eta \, \frac{1}{N} dw\)
\(\Delta b = \eta \, \frac{1}{N} db\)
Our linear model \(y = w \, x + b\) predicts outputs for an input \(x\). The error \(t-y\) between the prediction and the data is used to adapt the weight \(w\) and the bias \(b\) at the end of each epoch.
Q: Implement the LMS algorithm and apply it to the generated data. The Python code that you will write is almost a line-by-line translation of the pseudo-code above. You will use a learning rate eta = 0.1
at first, but you will vary this value later. Start by running a single epoch, as it will be easier to debug it, and then increase the number of epochs to 100. Print the value of the weight and bias at the end.
Q: Visualize the quality of the fit by superposing the learned model to the data with matplotlib.
Tip: you can get the extreme values of the xaxis with X.min()
and X.max()
. To visualize the model, you just need to plot a line between the points (X.min(), w*X.min()+b)
and (X.max(), w*X.max()+b)
.
Another option is to predict a value for all inputs and plot this vector \(y\) against the desired values \(t\).
Q: Make a scatter plot where \(t\) is the x-axis and \(y = w\, x + b\) is the y-axis. How should the points be arranged in the ideal case? Also plot what this ideal relationship should be.
A much better method to analyse the result of the learning algorithm is to track the mean squared error (mse) after each epoch, i.e. the loss function which we actually want to minimize. The MSE is defined as:
Q: Modify your LMS algorithm (either directly or copy it in the next cell) to track the mse after each epoch. After each epoch, append the mse on the training set to a list (initially empty) and plot it at the end. How does the mse evolve? Which value does it get in the end? Why? How many epochs do you actually need?
Let’s now study the influence of the learning rate eta=0.1
seemed to work, but is it the best value?
Q: Iterate over multiple values of eta
using a logarithmic scale and plot the final mse after 100 epochs as a function of the learning rate. Conclude.
Hint: the logarithmic scale means that you will try values such as \(10^{-5}\), \(10^{-4}\), \(10^{-3}\), etc. until 1.0. In Python, you can either write explictly 0.0001 or use the notation 1e-4
. For the plot, use np.log10(eta)
to only display the exponent on the X-axis.
3.1.2. Scikit-learn¶
The code that you have written is functional, but extremely slow, as you use for loops in Python. For so little data samples, it does not make a difference, but if you had millions of samples, this would start to be a problem.
The solution is to use optimized implementations of the algorithms, running in C++ or FORTRAN under the hood. We will use here the LMS algorithm provided by scikit-learn
as you have already installed it and it is very simple to use. Note that one could use tensorflow too, but that would be killing a fly with a sledgehammer.
scikit-learn
provides a LinearRegression
object that implements LMS. The documentation is at: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html.
You simply import it with:
from sklearn.linear_model import LinearRegression
You create the object with:
reg = LinearRegression()
reg
is now an object with different methods (fit()
, predict()
) that accept any kind of data and performs linear regression.
To train the model on the data \((X, t)\), simply use:
reg.fit(X, t)
The parameters of the model are obtained with reg.coef_
for \(w\) and reg.intercept_
for \(b\).
You can predict outputs for new inputs using:
y = reg.predict(X)
Q: Apply linear regression on the data using scikit-learn
. Check the model parameters after learning and compare them to what you obtained previously. Print the mse and make a plot comparing the predictions with the data.
3.1.3. Delta learning rule¶
Let’s now implement the online version of LMS, the delta learning rule. The only difference is that the parameter updates are applied immediately after each example is evaluated, not at the end of training.
\(w=0 \quad;\quad b=0\)
for M epochs:
for each sample \((x_i, t_i)\):
\(y_i = w \, x_i + b\)
\(\Delta w = \eta \, (t_i - y_i ) \, x_i\)
\(\Delta b = \eta \, (t_i - y_i)\)
Q: Implement the delta learning rule for the regression problem with eta = 0.1
. Plot the evolution of the mse and compare it to LMS.
Q: Vary the learning rate logarithmically as for LMS and conclude.