# Artificial Neural Network Part 7

0
157 Stochastic Gradient Descent: In our last topic we discussed about Gradient Descent. We learned that Cost Function is convex i.e. it always curves up and it has only a single minimum.

But what if cost function is not a convex function.  Then the algorithm we discussed in our last section might not work well. In that case we need to use a bit tweaked algorithm known as Stochastic Gradient Descent. As we can see in the above figure, convex function has single local minima but in case of non-convex function we have several local minima but we still could have global minimum.

Cost functions can broadly be divided into 2 categories.

Batch Gradient Descent: Batch Gradient Descent algorithm works fine when our cost function is convex. As discussed in our last section we would get single optimized combination of weights which we apply for all the rows in the dataset. But there is problem with this approach and that is if cost function is non-convex we might stuck in local minima rather than finding global minima. As we can see blue ball is stuck in local minima. Why Batch Gradient Descent function stuck in local minima? As we know, minima is the point where slope of curve is zero or gradient is equal to zero and this condition gets satisfied at local minima itself. This is why Batch Gradient Descent function shows convergence at local minima and outputs the final combination of weights.

Stochastic Gradient Descent: The problem of finding local minima is resolved in Stochastic Gradient Descent algorithm. In Stochastic Gradient Descent, rather than finding single optimized combination of weights for whole data set, we try to get optimized weights combination for each of the row in data set. This way we get several optimized combinations of weights and probability of finding global minimum increases.  As we can see, incase of Batch Gradient Descent, we calculate single combination of weights, so we can call Batch Gradient Descent a deterministic algorithm. But Stochastic Gradient Descent calculates optimized weights for each row of the dataset and adds randomness to the algorithm, which helps to find the global minima.