- When we do forward propagation, the output always have some sort of
error (the computed output is way off the real output)
- The difference between the real and computed output is error
- Back propagation is a method of updating weights and bias of inputs
through introducing this error from the output to the whole network
so that the error is reduced.
- This is actually another way of applying gradient descent to each
individual neurons so that the weights and bias are
updated for optimized result.
- slope of output node for the last hidden layer nodes = 2 * (actual value - predicted value in output node, that you get using forward propagation) * incoming input node value * slope of activation function, this is 1 for relu
- slope of weight = input node value * slope of output node * activation function slope (1 for relu)
- updated weight = weight or edge - learning rate * slope of weight
- backpropagated value in nodes other than inputs = last layer node * error * previous layers node value * slope of activation function
- It is common to calculate slopes on only a subset of the data (a batch) for computational efficiency
- Use a different batch of data to calculate the next update
- Start over from the beginning once all data is used
- Each time through the training data is called an epoch
- When slopes are calculated on one batch at a time: stochastic gradient descent
- backpropagation takes the prediction error from output layer and propagates it to the input layers through the hidden layers.
- Thus, it allows gradient descent to update all weights in the neural network (chain rule of calculus)
- Slope of node values are the sum of the slopes for all weights that come out of them
- Previous node value = node value * slope