Last Update: February 21, 2022
Linear Regression: Analysis of Variance ANOVA Table in R can be done using stats
package anova
function for analyzing dependent variable total variance together with its two components regression variance or explained variance and residual variance or unexplained variance. It is also used for evaluating whether adding independent variables improved linear regression model. Main parameter within anova
function is object
with constant or intercept only linear regression and linear regression model to be evaluated lm
objects.
As example, we can print ANOVA table from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
package HousePrices
object [1].
First, we load package AER
for data [2].
In [1]:
library(AER)
Second, we create HousePrices
data object from AER
package using data
function and print first six rows and first three columns of data using head
function to view data.frame
structure.
In [2]:
data(HousePrices)
head(HousePrices[, 1:3])
Out [2]:
price lotsize bedrooms
1 42000 5850 3
2 38500 4000 2
3 49500 3060 3
4 60500 6650 3
5 61000 6360 2
6 66000 4160 3
Third, we fit multiple linear regression using lm
function, store results within mlr
object and print its summary results using summary.lm
function. Within lm
function, parameter formula = price ~ lotsize + bedrooms
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
summary.lm(mlr)
Out [3]:
Call:
lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Residuals:
Min 1Q Median 3Q Max
-65665 -12498 -2075 8970 97205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.613e+03 4.103e+03 1.368 0.172
lotsize 6.053e+00 4.243e-01 14.265 < 2e-16 ***
bedrooms 1.057e+04 1.248e+03 8.470 2.31e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21230 on 543 degrees of freedom
Multiple R-squared: 0.3703, Adjusted R-squared: 0.3679
F-statistic: 159.6 on 2 and 543 DF, p-value: < 2.2e-16
Fourth, we fit constant or intercept only linear regression using lm
function, store results within lr1
object and print multiple linear regression ANOVA table using anova
function. Within lm
function, parameter formula = price ~ 1
fits constant or intercept only linear regression with house price as dependent variable because constant or intercept is a column of ones. Within anova
function, parameter object = lr1, mlr
includes constant or intercept only linear regression lr1
and multiple linear regression mlr
objects.
In [4]:
lr1 <- lm(formula = price ~ 1, data = HousePrices)
anova(object = lr1, mlr)
Out [4]:
Analysis of Variance Table
Model 1: price ~ 1
Model 2: price ~ lotsize + bedrooms
Res.Df RSS Df Sum of Sq F Pr(>F)
1 545 3.8860e+11
2 543 2.4472e+11 2 1.4389e+11 159.64 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 df_tot ss_tot
2 df_res ss_res df_reg ss_reg f_stat f_pval
Table 1. Analysis of Variance Table Output Description.
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.