Back Propagation: In our last section, we learned how neural network algorithm adjusts weights and biases using Stochastic Gradient Descent. Now, let us understand how exactly we calculate gradient of cost function.
At the heart of back propagation is partial derivative (∂C/∂W) of cost function C with respect to weight (or bias) in the network. The partial derivative function tries to compute, how quickly cost function changes with respect to weights or bias also known as Learning rate. So back propagation tells us how changing the weights will impact the cost function and finally how quickly we could get our optimized combination for weights with minimum cost function output.
So we can say when information is propagated from input layer to hidden layers, where after summarizing the input variables we apply an activation function and which finally result into cost function is known as Forward Propagation.
Once we get the final output and we compute the cost function which results into error or difference between actual and predicted values.
Cost Function = ∑ ½(Predicted Value – Actual Value) ^2
The process, where error are propagated back to input layer through hidden layer to adjust the weights is called Back Propagation.
With this we are done with our basic process flow of ANN, we would learn about processing of ANN using R and Python in the following sections.
Let us finally revise all the steps we follow in ANN algorithm:
Step1: Randomly initialize the weights to small values close to zero (but not equal to zero)
Step2: Pass first observation of the dataset through ANN algorithm
Step3: Forward Propagation: In which information is passed through neural network algorithm starting from input layer till the computation of cost function
Step4: Compare the predicted value and actual value to find out the difference or error
Step5: Back Propagation: In the step error are propagated back to the input layer to adjust the weights. The learning rate decides how fast we get the minimum cost function
Step6: Repeat Step1-Step5 till we get the minimum value of cost function with optimized value of weights for each of the input variable