Concept of Multicollinearity


Multicollinearity, or collinearity, is the existence of near-linear relationships among the independent variables. Multicollinearity is problem that we run into when we’re fitting a regression model, or another linear model. It refers to predictors that are correlated with other predictors in the model.

Moderate multicollinearity may not be problematic. However, severe multicollinearity is a problem because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable and difficult to interpret.

Effects of Multi-Collinearity:  Though multicollinearity does not reduce the overall fit, predictive power or reliability of the model as whole but it could impact the calculations pertaining to individual predictors.

Multicollinearity results into inaccurate estimates of the regression coefficients as it tends to inflate the standard errors of the regression coefficients and deflate the partial t-tests for the regression coefficients; also give false and nonsignificant, pvalues (inflated values) for individual variables.

In other words, even if multicollinearity is present in multivariate model it would still give good results while considering all the variables as a group. But if we try to estimate the change in dependent variable for a unit change in a variable which is linearly correlated with another variable then we might not get accurate results.

Mathematical interpretation of effect of Multicollinearity: In the presence of multicollinearity, the estimate of one variable’s impact on the dependent variable while controlling others tends to be less precise than if predictors were uncorrelated with one another. The usual interpretation of a regression coefficient is that it provides an estimate of the effect of a one unit change in an independent variable X1 holding the other variables constant.

Rank of a Matrix: The rank of a matrix is defined as (a) the maximum number of linearly independent column vectors in the matrix or (b) the maximum number of linearly independent row vectors in the matrix.

If there is an exact linear relationship (perfect multicollinearity) among the independent variables, at least one of the columns of the matrix is a linear combination of the others; then the rank of the matrix will be less than number of rows in the matrix which results into non-invertible matrix.

For non-invertible matrix; determinant of the matrix is equal to zero ; so during regression calculations, this relationship causes a division by zero which in turn causes the calculations to be aborted.

Final Notes of Multicollinearity:

  1. Multicollinearity tends to inflate the standard error of the co-efficient and In that case, the test of the hypothesis that the coefficient of the variable is equal to zero may fail to reject a false null hypothesis.
  2. Another issue with multicollinearity is that small changes to the input data can lead to large changes in the model,
  3. Multicollinearity even resulting in changes of sign of parameter estimates.

Remedies of Multicollinearity: The main solution is to keep only one of the two independent variables that are highly correlated with the regression model.



Please enter your comment!
Please enter your name here