import numpy as np
import matplotlib.pyplot as plt
Keras tutorial
The goal of this tutorial is to very quickly present keras, the high-level API of tensorflow, as it has already been seen in the Neurocomputing exercises. We will train a small fully-connected network on MNIST and observe what happens when the inputs or outputs are correlated, by training successively on the 0 digits, then the 1, etc.
Keras
The first step is to install tensorflow. The easiest way is to use pip:
pip install tensorflow
keras
is now available as a submodule of tensorflow (you can also install it as a separate package):
import tensorflow as tf
Keras provides a lot of ready-made layer types, activation functions, optimizers and so on. Do not hesitate to read its documentation on https://keras.io.
import tensorflow as tf
The most important object in keras is Sequential
. It is a container where you sequentially add layers of neurons (fully-connected, convolutional, recurrent, etc) and other stuff. It represents your model, i.e. the neural network itself.
= tf.keras.models.Sequential() model
You can then add()
layers to the model. A fully-connected layer is called Dense
in keras.
Let’s create a MLP with 10 input neurons, two hidden layers with 100 hidden neurons each and 3 output neurons.
The input layer is represented by the Input
layer:
10,))) model.add(tf.keras.layers.Input((
The first hidden layer can be added to the model with:
100, activation="relu")) model.add(tf.keras.layers.Dense(
The layer has 100 neurons and uses the ReLU activation function. One could optionally define the activation function as an additional “layer”, but it is usually not needed:
100))
model.add(tf.keras.layers.Dense('relu')) model.add(tf.keras.layers.Activation(
Adding more layers is straightforward:
100, activation="relu")) model.add(tf.keras.layers.Dense(
Finally, we can add the output layer. The activation function depends on the problem:
- For regression problems, a linear activation function should be used when the targets can take any value (e.g. Q-values):
3, activation="linear")) model.add(tf.keras.layers.Dense(
If the targets are bounded between 0 and 1, a logistic/sigmoid function can be used:
3, activation="sigmoid")) model.add(tf.keras.layers.Dense(
- For multi-class classification problems, a softmax activation function should be used:
3, activation="softmax")) model.add(tf.keras.layers.Dense(
This defines fully the structure of your desired neural network.
Q: Implement a neural network for classification with 10 input neurons, two hidden layers with 100 neurons each (using ReLU) and 3 output neurons.
Hint: print(model.summary())
gives you a summary of the architecture of your model. Note in particular the number of trainable parameters (weights and biases).
= tf.keras.models.Sequential()
model 10,)))
model.add(tf.keras.layers.Input((100, activation="relu"))
model.add(tf.keras.layers.Dense(100, activation='relu'))
model.add(tf.keras.layers.Dense(3, activation='softmax'))
model.add(tf.keras.layers.Dense(print(model.summary())
Metal device set to: Apple M1 Pro
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 100) 1100
dense_1 (Dense) (None, 100) 10100
dense_2 (Dense) (None, 3) 303
=================================================================
Total params: 11,503
Trainable params: 11,503
Non-trainable params: 0
_________________________________________________________________
None
2022-11-24 18:31:35.132826: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-11-24 18:31:35.133149: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
The next step is to choose an optimizer for the neural network, i.e. a variant of gradient descent that will be used to iteratively modify the parameters.
keras
provides an extensive list of optimizers: https://keras.io/optimizers/. The most useful in practice are:
SGD
, the vanilla stochastic gradient descent.
= tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9, nesterov=True) optimizer
RMSprop
, using second moments:
= tf.keras.optimizers.RMSprop(learning_rate=0.001) optimizer
Adam
:
= tf.keras.optimizers.Adam(learning_rate=0.001) optimizer
Choosing a optimizer is a matter of taste and trial-and-error. In deep RL, a good choice is Adam: the default values for its other parameters are usually good, it converges well, so your only job is to find the right learning rate.
Finally, the model must be compiled by defining:
A loss function. For multi-class classification, it should be
'categorical_crossentropy'
. For regression, it can be'mse'
. See the list of built-in loss functions here: https://keras.io/losses/ but know that you can also simply define your own.The chosen optimizer.
The metrics, i.e. what you want tensorflow to print during training. By default it only prints the current value of the loss function. For classification tasks, it usually makes more sense to also print the
accuracy
.
compile(
model.='categorical_crossentropy',
loss=optimizer,
optimizer=['accuracy']
metrics )
Q: Compile the model for classification, using the Adam optimizer and a learning rate of 0.01.
= tf.keras.optimizers.Adam(learning_rate=0.01)
optimizer
compile(
model.='categorical_crossentropy',
loss=optimizer,
optimizer=['accuracy']
metrics )
Let’s now train the model on some dummy data. To show the power of deep neural networks, we will try to learn noise by heart.
The following cell creates an input tensor X
with 1000 random vectors of 10 elements, with values sampled between -1 and 1. The targets (desired outputs) t
are class indices (0, 1 or 2), also randomly selected.
However, neural networks expect one-hot encoded vectors for the target, i.e. (1, 0, 0), (0, 1, 0), (0, 0, 1) instead of 0, 1, 2. The method tf.keras.utils.to_categorical
allows you to do that.
= np.random.uniform(-1.0, 1.0, (1000, 10))
X = np.random.randint(0, 3, (1000, ))
t = tf.keras.utils.to_categorical(t, 3) T
Let’s learn it. The Sequential
model has a method called fit()
where you simply pass the training data (X, T)
and some other parameters, such as:
- the batch size,
- the total number of epochs,
- the proportion of training examples to keep in order to compute the validation loss/accuracy (optional but recommmended).
# Training
= tf.keras.callbacks.History()
history
model.fit(
X, T,=100,
batch_size=50,
epochs=0.1,
validation_split=[history]
callbacks )
Q: Train the model on the data, using a batch size of 100 for 50 epochs. Explain why you obtained this result.
= model.fit(
history
X, T,=100,
batch_size=50,
epochs=0.1,
validation_split=2
verbose )
Epoch 1/50
2022-11-24 18:31:37.930309: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-11-24 18:31:38.157447: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
9/9 - 1s - loss: 1.1183 - accuracy: 0.3244 - val_loss: 1.1404 - val_accuracy: 0.3300 - 1s/epoch - 162ms/step
Epoch 2/50
9/9 - 0s - loss: 1.0802 - accuracy: 0.3878 - val_loss: 1.1295 - val_accuracy: 0.2700 - 96ms/epoch - 11ms/step
Epoch 3/50
2022-11-24 18:31:39.375487: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
9/9 - 0s - loss: 1.0632 - accuracy: 0.4667 - val_loss: 1.1320 - val_accuracy: 0.2500 - 93ms/epoch - 10ms/step
Epoch 4/50
9/9 - 0s - loss: 1.0375 - accuracy: 0.4767 - val_loss: 1.1671 - val_accuracy: 0.2500 - 89ms/epoch - 10ms/step
Epoch 5/50
9/9 - 0s - loss: 1.0111 - accuracy: 0.5111 - val_loss: 1.1752 - val_accuracy: 0.3100 - 89ms/epoch - 10ms/step
Epoch 6/50
9/9 - 0s - loss: 0.9863 - accuracy: 0.5033 - val_loss: 1.2184 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step
Epoch 7/50
9/9 - 0s - loss: 0.9553 - accuracy: 0.5433 - val_loss: 1.2167 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step
Epoch 8/50
9/9 - 0s - loss: 0.9197 - accuracy: 0.5678 - val_loss: 1.3535 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step
Epoch 9/50
9/9 - 0s - loss: 0.8824 - accuracy: 0.5900 - val_loss: 1.3571 - val_accuracy: 0.3200 - 88ms/epoch - 10ms/step
Epoch 10/50
9/9 - 0s - loss: 0.8560 - accuracy: 0.6267 - val_loss: 1.3682 - val_accuracy: 0.2400 - 90ms/epoch - 10ms/step
Epoch 11/50
9/9 - 0s - loss: 0.8125 - accuracy: 0.6467 - val_loss: 1.4521 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step
Epoch 12/50
9/9 - 0s - loss: 0.7604 - accuracy: 0.6667 - val_loss: 1.4538 - val_accuracy: 0.2700 - 88ms/epoch - 10ms/step
Epoch 13/50
9/9 - 0s - loss: 0.6953 - accuracy: 0.7289 - val_loss: 1.5755 - val_accuracy: 0.2900 - 87ms/epoch - 10ms/step
Epoch 14/50
9/9 - 0s - loss: 0.6484 - accuracy: 0.7500 - val_loss: 1.6463 - val_accuracy: 0.3000 - 88ms/epoch - 10ms/step
Epoch 15/50
9/9 - 0s - loss: 0.6264 - accuracy: 0.7433 - val_loss: 1.7369 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step
Epoch 16/50
9/9 - 0s - loss: 0.5558 - accuracy: 0.8022 - val_loss: 1.8089 - val_accuracy: 0.2600 - 88ms/epoch - 10ms/step
Epoch 17/50
9/9 - 0s - loss: 0.4886 - accuracy: 0.8300 - val_loss: 1.9365 - val_accuracy: 0.2700 - 93ms/epoch - 10ms/step
Epoch 18/50
9/9 - 0s - loss: 0.4341 - accuracy: 0.8644 - val_loss: 2.0011 - val_accuracy: 0.3100 - 101ms/epoch - 11ms/step
Epoch 19/50
9/9 - 0s - loss: 0.3999 - accuracy: 0.8700 - val_loss: 2.1024 - val_accuracy: 0.2900 - 106ms/epoch - 12ms/step
Epoch 20/50
9/9 - 0s - loss: 0.3671 - accuracy: 0.8833 - val_loss: 2.3319 - val_accuracy: 0.3400 - 99ms/epoch - 11ms/step
Epoch 21/50
9/9 - 0s - loss: 0.3761 - accuracy: 0.8667 - val_loss: 2.4605 - val_accuracy: 0.2700 - 100ms/epoch - 11ms/step
Epoch 22/50
9/9 - 0s - loss: 0.3324 - accuracy: 0.8822 - val_loss: 2.4886 - val_accuracy: 0.3200 - 97ms/epoch - 11ms/step
Epoch 23/50
9/9 - 0s - loss: 0.2755 - accuracy: 0.9100 - val_loss: 2.5855 - val_accuracy: 0.3200 - 98ms/epoch - 11ms/step
Epoch 24/50
9/9 - 0s - loss: 0.2489 - accuracy: 0.9256 - val_loss: 2.7775 - val_accuracy: 0.3400 - 98ms/epoch - 11ms/step
Epoch 25/50
9/9 - 0s - loss: 0.2084 - accuracy: 0.9489 - val_loss: 2.7613 - val_accuracy: 0.2900 - 93ms/epoch - 10ms/step
Epoch 26/50
9/9 - 0s - loss: 0.1561 - accuracy: 0.9744 - val_loss: 3.0682 - val_accuracy: 0.3000 - 97ms/epoch - 11ms/step
Epoch 27/50
9/9 - 0s - loss: 0.1364 - accuracy: 0.9756 - val_loss: 3.0417 - val_accuracy: 0.2900 - 91ms/epoch - 10ms/step
Epoch 28/50
9/9 - 0s - loss: 0.1177 - accuracy: 0.9833 - val_loss: 3.2456 - val_accuracy: 0.2900 - 92ms/epoch - 10ms/step
Epoch 29/50
9/9 - 0s - loss: 0.0996 - accuracy: 0.9933 - val_loss: 3.3559 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step
Epoch 30/50
9/9 - 0s - loss: 0.0882 - accuracy: 0.9911 - val_loss: 3.4259 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step
Epoch 31/50
9/9 - 0s - loss: 0.0788 - accuracy: 0.9944 - val_loss: 3.6843 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step
Epoch 32/50
9/9 - 0s - loss: 0.0636 - accuracy: 0.9978 - val_loss: 3.6871 - val_accuracy: 0.3000 - 89ms/epoch - 10ms/step
Epoch 33/50
9/9 - 0s - loss: 0.0502 - accuracy: 1.0000 - val_loss: 3.8740 - val_accuracy: 0.3000 - 87ms/epoch - 10ms/step
Epoch 34/50
9/9 - 0s - loss: 0.0451 - accuracy: 0.9989 - val_loss: 3.7788 - val_accuracy: 0.3100 - 87ms/epoch - 10ms/step
Epoch 35/50
9/9 - 0s - loss: 0.0377 - accuracy: 1.0000 - val_loss: 4.1327 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step
Epoch 36/50
9/9 - 0s - loss: 0.0320 - accuracy: 1.0000 - val_loss: 3.9984 - val_accuracy: 0.2900 - 91ms/epoch - 10ms/step
Epoch 37/50
9/9 - 0s - loss: 0.0261 - accuracy: 1.0000 - val_loss: 4.1676 - val_accuracy: 0.2900 - 88ms/epoch - 10ms/step
Epoch 38/50
9/9 - 0s - loss: 0.0223 - accuracy: 1.0000 - val_loss: 4.2871 - val_accuracy: 0.2800 - 91ms/epoch - 10ms/step
Epoch 39/50
9/9 - 0s - loss: 0.0198 - accuracy: 1.0000 - val_loss: 4.3003 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step
Epoch 40/50
9/9 - 0s - loss: 0.0175 - accuracy: 1.0000 - val_loss: 4.4379 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step
Epoch 41/50
9/9 - 0s - loss: 0.0164 - accuracy: 1.0000 - val_loss: 4.4096 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step
Epoch 42/50
9/9 - 0s - loss: 0.0145 - accuracy: 1.0000 - val_loss: 4.5412 - val_accuracy: 0.2900 - 90ms/epoch - 10ms/step
Epoch 43/50
9/9 - 0s - loss: 0.0134 - accuracy: 1.0000 - val_loss: 4.5190 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step
Epoch 44/50
9/9 - 0s - loss: 0.0124 - accuracy: 1.0000 - val_loss: 4.6218 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step
Epoch 45/50
9/9 - 0s - loss: 0.0115 - accuracy: 1.0000 - val_loss: 4.6663 - val_accuracy: 0.2800 - 87ms/epoch - 10ms/step
Epoch 46/50
9/9 - 0s - loss: 0.0108 - accuracy: 1.0000 - val_loss: 4.6790 - val_accuracy: 0.3000 - 91ms/epoch - 10ms/step
Epoch 47/50
9/9 - 0s - loss: 0.0101 - accuracy: 1.0000 - val_loss: 4.7649 - val_accuracy: 0.2900 - 87ms/epoch - 10ms/step
Epoch 48/50
9/9 - 0s - loss: 0.0096 - accuracy: 1.0000 - val_loss: 4.8301 - val_accuracy: 0.2900 - 95ms/epoch - 11ms/step
Epoch 49/50
9/9 - 0s - loss: 0.0089 - accuracy: 1.0000 - val_loss: 4.8164 - val_accuracy: 0.2900 - 90ms/epoch - 10ms/step
Epoch 50/50
9/9 - 0s - loss: 0.0084 - accuracy: 1.0000 - val_loss: 4.9097 - val_accuracy: 0.2900 - 88ms/epoch - 10ms/step
A: The final training is 100%, the validation accuracy is 33% (may vary depending on initialization). The network has learned the training examples by heart, although they are totally random, but totally fails to generalize.
The main is reason is that we have only 1000 training examples, with a total number of free parameters (VC dimension) around 11500. By definition, the model can learn this training set perfectly, although it is totally random. Its VC dimension is however way to high to generalize anything. It is even worse here: as the data is random, there is nothing to generalize. A nice example to understand why NN overfit…
Training a MLP on MNIST
Let’s now try to learn something a bit more serious, the MNIST dataset. The following cell load the MNIST data (training set 60000 28x28 monochrome images, test set of 10000 images), normalizes it (values betwen 0 and 1 for each pixel), removes the mean image from the training set and transforms the targets to one-hot encoded vectors for the 10 classes. See the neurocomputing exercise for more details.
# Load the MNIST dataset
= tf.keras.datasets.mnist.load_data()
(X_train, t_train), (X_test, t_test) print("Training data:", X_train.shape, t_train.shape)
print("Test data:", X_test.shape, t_test.shape)
# Reshape the images to vectors and normalize
= X_train.reshape(X_train.shape[0], 784).astype('float32') / 255.
X_train = X_test.reshape(X_test.shape[0], 784).astype('float32') / 255.
X_test
# Mean removal
= np.mean(X_train, axis=0)
X_mean -= X_mean
X_train -= X_mean
X_test
# One-hot encoded outputs
= tf.keras.utils.to_categorical(t_train, 10)
T_train = tf.keras.utils.to_categorical(t_test, 10) T_test
Training data: (60000, 28, 28) (60000,)
Test data: (10000, 28, 28) (10000,)
Q: Create a fully connected neural network with 784 input neurons (one per pixel), 10 softmax output neurons and whatever you want in the middle, so that it can reach around 98% validation accuracy after 20 epochs.
- Put the network creation (including
compile()
) in a methodcreate_model()
, so that you can create a model multiple times. - Choose a good value for the learning rate.
- Do not exagerate with the number of layers and neurons. Two or there hidden layers with 100 to 300 neurons are more than enough.
- You will quickly observe that the network overfits: the training accuracy is higher than the validation accuracy. The training accuracy actually goes to 100% if your network is too big. In that case, feel free to add a dropout layer after each fully-connected layer:
0.5)) model.add(tf.keras.layers.Dropout(
def create_model():
# Create the model
= tf.keras.models.Sequential()
model
# Input layer with 784 pixels
784,)))
model.add(tf.keras.layers.Input((
# Hidden layer with 150 neurons
150, activation="relu"))
model.add(tf.keras.layers.Dense(0.5))
model.add(tf.keras.layers.Dropout(
# Second hidden layer with 100 neurons
100, activation="relu"))
model.add(tf.keras.layers.Dense(0.5))
model.add(tf.keras.layers.Dropout(
# Softmax output layer with 10 neurons
10, activation="softmax"))
model.add(tf.keras.layers.Dense(
# Learning rule
= tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer
# Loss function
compile(
model.='categorical_crossentropy', # loss
loss=optimizer, # learning rule
optimizer=['accuracy'] # show accuracy
metrics
)print(model.summary())
return model
= create_model()
model
# Training
= tf.keras.callbacks.History()
history
model.fit(
X_train, T_train,=100,
batch_size=20,
epochs=0.1,
validation_split=[history]
callbacks )
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 150) 117750
dropout (Dropout) (None, 150) 0
dense_4 (Dense) (None, 100) 15100
dropout_1 (Dropout) (None, 100) 0
dense_5 (Dense) (None, 10) 1010
=================================================================
Total params: 133,860
Trainable params: 133,860
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/20
2022-11-24 18:31:53.203907: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
538/540 [============================>.] - ETA: 0s - loss: 0.5194 - accuracy: 0.8409
2022-11-24 18:31:58.951897: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
540/540 [==============================] - 6s 11ms/step - loss: 0.5185 - accuracy: 0.8411 - val_loss: 0.1482 - val_accuracy: 0.9577
Epoch 2/20
540/540 [==============================] - 6s 10ms/step - loss: 0.2497 - accuracy: 0.9261 - val_loss: 0.1195 - val_accuracy: 0.9637
Epoch 3/20
540/540 [==============================] - 5s 10ms/step - loss: 0.2007 - accuracy: 0.9414 - val_loss: 0.0993 - val_accuracy: 0.9705
Epoch 4/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1732 - accuracy: 0.9485 - val_loss: 0.0932 - val_accuracy: 0.9718
Epoch 5/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1580 - accuracy: 0.9530 - val_loss: 0.0823 - val_accuracy: 0.9747
Epoch 6/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1453 - accuracy: 0.9562 - val_loss: 0.0819 - val_accuracy: 0.9752
Epoch 7/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1319 - accuracy: 0.9602 - val_loss: 0.0800 - val_accuracy: 0.9747
Epoch 8/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1238 - accuracy: 0.9616 - val_loss: 0.0740 - val_accuracy: 0.9780
Epoch 9/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1178 - accuracy: 0.9639 - val_loss: 0.0726 - val_accuracy: 0.9772
Epoch 10/20
540/540 [==============================] - 6s 10ms/step - loss: 0.1109 - accuracy: 0.9657 - val_loss: 0.0719 - val_accuracy: 0.9785
Epoch 11/20
540/540 [==============================] - 6s 10ms/step - loss: 0.1055 - accuracy: 0.9670 - val_loss: 0.0729 - val_accuracy: 0.9772
Epoch 12/20
540/540 [==============================] - 5s 10ms/step - loss: 0.1029 - accuracy: 0.9681 - val_loss: 0.0718 - val_accuracy: 0.9777
Epoch 13/20
540/540 [==============================] - 6s 10ms/step - loss: 0.1007 - accuracy: 0.9683 - val_loss: 0.0664 - val_accuracy: 0.9803
Epoch 14/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0939 - accuracy: 0.9701 - val_loss: 0.0688 - val_accuracy: 0.9778
Epoch 15/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0916 - accuracy: 0.9716 - val_loss: 0.0684 - val_accuracy: 0.9793
Epoch 16/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0889 - accuracy: 0.9723 - val_loss: 0.0661 - val_accuracy: 0.9822
Epoch 17/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0845 - accuracy: 0.9740 - val_loss: 0.0687 - val_accuracy: 0.9807
Epoch 18/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0840 - accuracy: 0.9733 - val_loss: 0.0685 - val_accuracy: 0.9812
Epoch 19/20
540/540 [==============================] - 6s 11ms/step - loss: 0.0791 - accuracy: 0.9755 - val_loss: 0.0717 - val_accuracy: 0.9808
Epoch 20/20
540/540 [==============================] - 5s 10ms/step - loss: 0.0817 - accuracy: 0.9740 - val_loss: 0.0669 - val_accuracy: 0.9815
<keras.callbacks.History at 0x16579a610>
After training, one should evaluate the model on the test set. keras
provides an evaluate()
method that computes the different metrics (in our case the loss) on the data:
= model.evaluate(X_test, T_test) score
Another solution would be to predict()
labels on the test set and manually compare them to the ground truth:
= model.predict(X_test)
Y = - np.mean(T_test * np.log(Y))
loss = np.argmax(Y, axis=1)
predicted_classes = 1.0 - np.sum(predicted_classes != t_test)/t_test.shape[0] accuracy
Another important thing to visualize after training is how the training and validation loss (or accuracy) evolved during training. The fit()
method updates a History
object which contains the history of your metrics (loss and accuracy) after each epoch of training. These are simple numpy arrays, accessible with:
'loss']
history.history['val_loss']
history.history['accuracy']
history.history['val_accuracy'] history.history[
Q: Compute the test loss and accuracy of your model. Plot the history of the training and validation loss/accuracy.
# Testing
= model.evaluate(X_test, T_test)
score print('Test loss:', score[0])
print('Test accuracy:', score[1])
=(15, 6))
plt.figure(figsize121)
plt.subplot('loss'], '-r', label="Training")
plt.plot(history.history['val_loss'], '-b', label="Validation")
plt.plot(history.history['Epoch #')
plt.xlabel('Loss')
plt.ylabel(
plt.legend()122)
plt.subplot('accuracy'], '-r', label="Training")
plt.plot(history.history['val_accuracy'], '-b', label="Validation")
plt.plot(history.history['Epoch #')
plt.xlabel('Accuracy')
plt.ylabel(
plt.legend() plt.show()
14/313 [>.............................] - ETA: 2s - loss: 0.0518 - accuracy: 0.9866
2022-11-24 18:33:42.354208: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
313/313 [==============================] - 3s 9ms/step - loss: 0.0763 - accuracy: 0.9782
Test loss: 0.07633379101753235
Test accuracy: 0.9782000184059143