Can R-Square be Negative: In Statistical Modeling R-Square is referred as Co-Efficient of Determination. Is is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Basically, R Square helps to decide the fit of linear regression model in comparison to mean of the data.
In the plot below, blue line indicates the data for which we try to generate the regression line and horizontal red line represents the average of data.
While, developing Linear Regression Model, our goal has to find a line which fits data better than mean of the data.
For practical purposes, the lowest R2 we can get is zero. But while developing Linear Model we make an assumption that if our regression line does not fit better than mean, then we will just use the mean value. However if your regression line is worse than using the mean value, the r squared value that we calculate will be negative.
The most common way to end up with a negative r squared value is to force your regression line through a specific point, typically by setting the intercept. The way the ordinary least squares regression equations work is by making a line that passes through a specified point and has the lowest possible sum squared error while still passing through the specified point.
If the point that is chosen is the mean value of x and y, the resulting line will have the lowest possible sum squared error, and the highest possible R-squared value. If we chose the mean x and y, we cannot get a negative r squared value.
However if we specify a different point for the regression line to go through, you will still get the line that generates the lowest sum squared error through that point, but that doesn’t mean that line is good.
The intercept for both of the regression lines was set at zero. For the regression line for the blue points, that is not too far off of the best possible regression line, and so the resulting R2value is positive. However for the regression line for the red points, the true intercept should be around 120, so setting the intercept to be 0 forces the regression line far away from where it should be. The result is that the regression sum squared error is greater than if you used the mean value, and hence a negative r squared value is the result.