Ridge regression is a way to create a parsimonious model when the number of predictor variables in a set exceeds the number of observations, or when a data set has multicollinearity.
Though Linear Regression and Logistic Regression are most commonly used techniques but Ridge Regression is preferred while analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.
Why to use Ridge Regression: The Ridge regression is a technique which is specialized to analyze multiple regression data which is multi-collinear in nature.
What is Multi-Collinearity: Multicollinearity, or collinearity, is the existence of near-linear relationships among the independent variables.
Regularization Technique: Ridge regression is used to create a parsimonious model in the following scenarios.
- The number of predictor variables in a given set exceeds the number of observations
- The dataset has multicollinearity (that is correlations between predictor variables).
The regularization techniques are as follows.
- Penalize the magnitude of coefficients of features
- Minimize the error between the actual and predicted observations
Ridge regression performs L2 regularization. Here the penalty equivalent to the square of the magnitude of coefficients is added. The minimization objective is as followed.
Here λ is the turning factor that controls the strength of the penalty term.
If λ = 0, the objective becomes similar to simple linear regression. So we get the same coefficients as simple linear regression.
If λ = ∞, the coefficients will be zero because of infinite weightage on the square of coefficients as anything less than zero makes the objective infinite.
If 0 < λ < ∞, the magnitude of λ decides the weightage given to the different parts of the objective.
In simple terms, the minimization objective = LS Obj + λ (sum of the square of coefficients)
As ridge regression shrinks the coefficients towards zero, it introduces some bias. But it can reduce the variance to a great extent which will result in a better mean-squared error. The amount of shrinkage is controlled by λ which multiplies the ridge penalty. As large λ means more shrinkage, we can get different coefficient estimates for the different values of λ.
- Ridge regression estimates tend to be stable in the sense that they are usually little affected by small changes in the data on which the fitted regression is based. In contrast, ordinary least squares estimates may be highly unstable under these conditions when the independent variables are highly multi-collinear.
- While multicollinearity does not affect the precision of the estimated responses (and predictions) at the observed points, it does cause variance inflation of estimated responses at other points.