Last Update: February 21, 2022
Linear Regression: Analysis of Variance ANOVA Table in R can be done using stats package anova function for analyzing dependent variable total variance together with its two components regression variance or explained variance and residual variance or unexplained variance. It is also used for evaluating whether adding independent variables improved linear regression model. Main parameter within anova function is object with constant or intercept only linear regression and linear regression model to be evaluated lm objects.
As example, we can print ANOVA table from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER package HousePrices object [1].
First, we load package AER for data [2].
In [1]:
library(AER)
Second, we create HousePrices data object from AER package using data function and print first six rows and first three columns of data using head function to view data.frame structure.
In [2]:
data(HousePrices)
head(HousePrices[, 1:3])
Out [2]:
price lotsize bedrooms
1 42000 5850 3
2 38500 4000 2
3 49500 3060 3
4 60500 6650 3
5 61000 6360 2
6 66000 4160 3
Third, we fit multiple linear regression using lm function, store results within mlr object and print its summary results using summary.lm function. Within lm function, parameter formula = price ~ lotsize + bedrooms fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
summary.lm(mlr)
Out [3]:
Call:
lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Residuals:
Min 1Q Median 3Q Max
-65665 -12498 -2075 8970 97205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.613e+03 4.103e+03 1.368 0.172
lotsize 6.053e+00 4.243e-01 14.265 < 2e-16 ***
bedrooms 1.057e+04 1.248e+03 8.470 2.31e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21230 on 543 degrees of freedom
Multiple R-squared: 0.3703, Adjusted R-squared: 0.3679
F-statistic: 159.6 on 2 and 543 DF, p-value: < 2.2e-16
Fourth, we fit constant or intercept only linear regression using lm function, store results within lr1 object and print multiple linear regression ANOVA table using anova function. Within lm function, parameter formula = price ~ 1 fits constant or intercept only linear regression with house price as dependent variable because constant or intercept is a column of ones. Within anova function, parameter object = lr1, mlr includes constant or intercept only linear regression lr1 and multiple linear regression mlr objects.
In [4]:
lr1 <- lm(formula = price ~ 1, data = HousePrices)
anova(object = lr1, mlr)
Out [4]:
Analysis of Variance Table
Model 1: price ~ 1
Model 2: price ~ lotsize + bedrooms
Res.Df RSS Df Sum of Sq F Pr(>F)
1 545 3.8860e+11
2 543 2.4472e+11 2 1.4389e+11 159.64 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 df_tot ss_tot
2 df_res ss_res df_reg ss_reg f_stat f_pval
Table 1. Analysis of Variance Table Output Description.
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.