# Linear Regression: Coefficient of Determination

Last Update: February 21, 2022

Coefficient of determination or r-squared $r^2$ is used to evaluate linear regression goodness of fit by estimating the percentage of the variance from dependent variable $y$ explained by its relationship with independent variable $x$. When linear regression has two or more independent variables $x_{1},...,x_{p}$, it is known as coefficient of multiple determination or multiple r-squared.

As example, we can fit a three-variable multiple linear regression with formula $\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1)$. Then, we can estimate its coefficient of multiple determination with formula $r^2=1-\frac{ss_{res}}{ss_{tot}}\;(2)$. Residuals sum of squares $ss_{res}$ with formula $ss_{res}=\sum_{i=1}^{n}\hat{e}_{i}^{2}\;(3)$ is estimated as the sum of squared regression residuals $\hat{e}_{i}$. Regression residuals $\hat{e}_{i}$ with formula $\hat{e}_{i}=y_{i}-\hat{y}_{i}\;(4)$ are estimated as the differences between actual $y_{i}$ and fitted $\hat{y}_{i}$ values. Total sum of squares $ss_{tot}$ with formula $ss_{tot}=\sum_{i=1}^{n}(y_{i}-\bar{y})^2\;(5)$ is calculated as the sum of squared differences between dependent variable $y_{i}$ values and their arithmetic mean $\bar{y}$.

Adding independent variables $x_{1},...,x_{p}$ is likely to increase coefficient of multiple determination. Therefore, we can estimate adjusted coefficient of multiple determination or adjusted multiple r-squared $\bar{r}^2$ with formula $\bar{r}^2=1-\frac{ss_{res}}{ss_{tot}}*\frac{df_{tot}}{df{res}}\;(6)$ by taking model degrees of freedom reduction into consideration. Total degrees of freedom $df_{tot}$ with formula $df_{tot}=n-1\;(7)$ is the number of observations $n$ minus the constant term. Residuals degrees of freedom $df_{res}$ with formula $df_{res}=n-p-1\;(8)$ is the number of observations $n$ minus number of independent variables $p$ minus the constant term.

Below, we find an example of estimated coefficients of multiple determination from multiple linear regression of house price explained by its lot size and number of bedrooms [1].

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

+