# Linear Regression: Analysis of Variance ANOVA Table

Last Update: February 21, 2022

Linear Regression: Analysis of Variance ANOVA Table is used to analyze dependent variable $y$ total variance together with its two components model fitted value $\hat{y}$ regression variance or explained variance and model random error $\hat{e}$ residual variance or unexplained variance. It is also used to evaluate whether adding independent variables improved linear regression model.

As example, we can fit a three-variable multiple linear regression with formula $\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1)$. Then, we can calculate ANOVA table regression degrees of freedom with formula $df_{reg}=p\;(2)$, residual degrees of freedom with formula $df_{res}=n-p-1\;(3)$ and total degrees of freedom with formula $df_{tot}=n-1\;(4)$ where $p$ is number of independent variables, $n$ is number of observations and $1$ is constant term. Next, we can estimate regression sum of squares with formula $ss_{reg}=\sum_{i=1}^{n}(\hat{y}_{i}-\bar{y})^{2}\;(5)$, residual sum of squares with formula $ss_{res}=\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}=\sum_{i=1}^{n}\hat{e}_{i}^{2}\;(6)$ and calculate total sum of squares with formula $ss_{tot}=\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}\;(7)$ where $\hat{y}_{i}$ are regression fitted values, $\bar{y}$ is dependent variable mean, $y_{i}$ are dependent variable values and $\hat{e}_{i}$ are regression residuals. After that, we estimate regression mean squared error with formula $ms_{reg}=\frac{ss_{reg}}{df_{reg}}\;(8)$ and residual mean squared error with formula $ms_{res}=\frac{ss_{res}}{df_{res}}\;(9)$. Later, we estimate F-statistic with formula $F=\frac{ms_{reg}}{ms_{res}}\;(10)$ and do F-test with joint null hypothesis that independent variables $x_{1},x_{2}$ coefficients are equal to zero with formula $H_{0}:\hat{\beta}_{1}=\hat{\beta}_{2}=0\;(11)$. If joint null hypothesis is rejected, then adding independent variables $x_{1}$ and/or $x_{2}$ improved linear regression model.

Below, we find an example of analysis of variance ANOVA table from multiple linear regression of house price explained by its lot size and number of bedrooms [1].

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

+