The Lasso Regression: LASSO – Least Absolute Shrinkage and Selection Operator is a regression analysis method that performs both feature selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.
Let us try to understand LASSO:
Lasso regression is one of the regularization method that creates parsimonious models in the presence of large number of features, where large means either of the below two things:
- Large enough to enhance the tendency of the model to over-fit. Minimum ten variables can cause overfitting.
- Large enough to cause computational challenges. This situation can arise in case of millions or billions of features.
Lasso regression performs L1 regularization that is it adds the penalty equivalent to the absolute value of the magnitude of the coefficients. Here the minimization objective is as followed.
Minimization objective = LS Obj + λ (sum of absolute value of coefficients)
Where LS Obj stands for Least Squares Objective which is nothing but the linear regression objective without regularization and λ is the turning factor that controls the amount of regularization. The bias will increase with the increasing value of λ and the variance will decrease as the amount of shrinkage (λ) increases.
What large coefficient signifies?
By using large coefficient, we are putting a huge emphasis on the particular feature that it can be a good predictor of the outcome. And when it is too large, the algorithm starts modeling intricate relations to calculate the output & ends up overfitting to the particular data. Lasso regression adds a factor of the sum of the absolute value of the coefficients the optimization objective.
Case 1: When λ = 0, it would result into same coefficients as that of Simple Linear Regression
Case 2: When λ = ∞, all the coefficients will be equal to zero
Case 3: When 0 < λ = ∞, we get coefficients between 0 and simple linear regression.
L1 Regularization penalty in LASSO causes some coefficients to be shrunk to zero. As the value of λ increases more and more coefficients value tend to zero and even if value of coefficient does not shrink to zero, it certainly reduces the magnitude of the coefficients.