Watch the 3blue1brown series **before** the lecture: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

- Video 1: But what is a neural network?
- Video 2: Gradient descent, how neural networks learn
- Video 3: What is backpropagation really doing?
- Video 4 (optional): Backpropagation calculus

Moreover, read the Illustrated Guide to Recurrent Neural Networks by Michael Nguyen.

- Neurons
- Weights and biases
- Activation function
- Hidden layers
- Feed forward
- Back propagation and gradient descent

- Feed Forward (FF)
- Recurrent Neural Network (RNN)
- Long Short Term Memory (LSTM)
- … and others that are still not much used in SE, such as Convolutional Neural Networks.

See the Neural Network Zoo by the Asimov Institute.

Our goal is to create a Neural Network that is able to recognize numbers that were written by hand.

We use the MNIST dataset, which contains 60k training examples + 10k test examples.

Open the “feed-forward-nn-hand-written-recognition” Jupyter notebook.

- RNNs: when the order matter!
- RNNs might suffer in keeping the information from way back (“vanishing gradients”).
- LSTM: long/short term memory

Our goal is to create a RNN that write songs like Freddy Mercury.

We use all Queen’s songs as datasets.

Open the “rnn-and-lstm-sing-like-freddy” Jupyter notebook.

It is hard to know in advance the best architecture for your problem.

We have to experiment with different hyper parameters: number of layers, neurons per layer, learning rate, activation functions.

Machine learning is empirical!

Too little layers/neurons: Underfitting. The problem might be too complex to be represented with such a little number of neurons.

Too many layers/neurons: Overfitting. The network might just “memorize” and not learn.

Choose a:

- Linear function for regression problems.
- Sigmoid for binary classification.
- Softmax for probabilities and multiclassification.
- ReLU for for the hidden layers.

Read a simple explanation of activation functions here.

Choose:

- Binary Cross-entropy for binary problems
- Cross-entropy for multi-class classification problem
- Mean Squared Error for regression problems

Read this nice explanation on how to choose activation and loss functions.

Should also be tuned.

Read the tradeoff batch size vs number of iterations to train a NN discussion on Stack Overflow.

Dropout is a technique used to improve over-fit on neural networks.

Basically, during training half of neurons on a particular layer will be deactivated. This improve generalization because force your layer to learn with different neurons the same “concept”.

During the prediction phase the dropout is deactivated.

(Extracted from Leonardo Araujo Santos’s online book)

- Training vs test
- k-fold validation (really needed in Deep Learning?)
- Accuracy, precision, recall
- Comparison with a baseline

The course contents are copyrighted (c) 2018 - onwards by TU Delft and their respective authors and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.