Regularization: Regularization is a common way of controlling or reducing the problem of overfitting in a flexible and tunable manner. Problem of overfitting is the result of model trying to capture noises with in the data.
For any machine learning algorithm, we can break our data point into two components i.e. pattern within data and noise within data:
Noise: Noise within data is generally meaningless information available which can also be termed as corrupt data. In other words we can say, noise refers to data points which do not represents true properties of data but random chance.
Goal of any statistical model or Machine learning algorithm is to capture the pattern with in data and ignore the noise within data.
Let us suppose we want to fit a linear regression model to predict the scores for a class with 100 students. We could predict the scores based on past performance of the students. We could divide the students in various categories.
- Male\Female\Family Income
- Male\Female\Family Income\Education of the parents
- Male\Female\Family Income\Education of the parents\Roll Numbers
So, as we increase number of predictors we would get better fit of the model. But in 4th option as we add roll numbers we might predict exact score of the student but it would result into over-fit of the model. Over fit model tends to fail in out of period data.
In the last case where we consider the roll numbers, we over fit the data. We could easily guess in last case our equation would be of the higher degree than the quadratic equation. The higher degree polynomial might give a highly accurate model on training data but would certainly fail on out of period sample or hold out sample.
Regularization: Regularization is one of the most common ways to overcome the problem of over fitting. In regularization we try to penalize the model for higher number variables. Basically, we try to reduce the value of the estimates of the variables within model equation. During the process of penalizing the estimates we try bring the value of estimates of insignificant variables close to zero as much as possible or even equal to zero. Thus, due to penalizing the estimates, even though variable is within the model but impact of the variable is reduced to insignificant level and thus prevents the over fitting.
There are two common ways of Regularization.
- Ridge Regression:
- Performs L2 Regularization i.e. adds penalty equivalent to square of magnitude of coefficients
- Minimization objective= LS Obj + α * (sum of square of coefficients)
- Lasso Regression:
- Performs L1 Regularization i.e. adds penalty equivalent to absolute value of the magnitude of coefficients
- Minimization objective = LS Obj + α * (sum of absolute value of coefficients)
The LS is Least Square of error. We would learn more about Ridge Regression and Lasso Regression in following sections.