L2 loss function
Neural Network Training
What we need to look for through neural network training are weights and biases to minimize the consequences of the loss function. When is a vector representing weights and biases,
It must be , because should decrease. Therfore, can be determined as
is called learning rate and is called step. If the step is large, may diverge, and if the step is small, the convergence speed may be slow, so an appropriate value should be determined.
If is determined, then can be
Stochastic gradient descent
When the number of training inputs is very large, this can take a long time. Stochastic gradient descent works by randomly picking out a small number of randomly chosen training inputs.
Those random training inputs are called mini-batch.
Forward propagation (or forward pass) refers to the calculation and storage of intermediate variables (including outputs) for a neural network in order from the input layer to the output layer.
Back-propagation is used to find , because it is difficult for a computer to obtain by differentiating loss function.
Error of neuron in layer is defined as
Since was obtained from forward propagation, If we know , we can get as below.
If we use L2 loss, since was obtained from forward propagation and , we can get the errors like this:
Finally, can be obtained by using the errors obtained above.
Set initail weights and biases to random and repeat process Forward-propagation -> Back-propagation -> weights and biases update. When it is judged that cannot be made smaller, the final weights and biases are determined.