# Linear Regression: Analysis of Variance ANOVA Table in Python

Last Update: February 21, 2022

Linear Regression: Analysis of Variance ANOVA Table in Python can be done using `statsmodels` package `anova_lm` function found within `statsmodels.api.stats` module for analyzing dependent variable total variance together with its two components regression variance or explained variance and residual variance or unexplained variance. It is also used for evaluating whether adding independent variables improved linear regression model. Main parameters within `anova_lm` function are `args` with constant or intercept only linear regression and linear regression to be evaluated fitted models results, `test` with test statistics included and `typ` with ANOVA test type.

As example, we can print ANOVA table from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within `AER` R package `HousePrices` object .

First, we import `statsmodels` package for data downloading, multiple linear regression fitting and ANOVA table estimation .

``````In :
import statsmodels.api as sm
import statsmodels.formula.api as smf
``````

Second, we create `houseprices` data object using `get_rdataset` function and display first five rows and first three columns of data using `print` function and `head` data frame method to view its structure.

``````In :
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
``````
``````Out :
price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2
``````

Third, we fit multiple linear regression with `ols` function using variables within `houseprices` data object, store results within `mlr` object and print `mlr` object summary results using its `summary` method. Within `ols` function, parameter `formula="price ~ lotsize + bedrooms"` fits model where house price is explained by its lot size and number of bedrooms.

``````In :
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
print(mlr.summary())
``````
``````Out :
OLS Regression Results
==============================================================================
Dep. Variable:                  price   R-squared:                       0.370
Method:                 Least Squares   F-statistic:                     159.6
Date:                Mon, 08 Nov 2021   Prob (F-statistic):           2.95e-55
Time:                        19:08:52   Log-Likelihood:                -6213.1
No. Observations:                 546   AIC:                         1.243e+04
Df Residuals:                     543   BIC:                         1.245e+04
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   5612.5997   4102.819      1.368      0.172   -2446.741    1.37e+04
lotsize        6.0530      0.424     14.265      0.000       5.219       6.887
bedrooms    1.057e+04   1247.676      8.470      0.000    8116.488     1.3e+04
==============================================================================
Omnibus:                       77.789   Durbin-Watson:                   1.193
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              146.854
Skew:                           0.833   Prob(JB):                     1.29e-32
Kurtosis:                       4.919   Cond. No.                     2.60e+04
==============================================================================

Notes:
 Standard Errors assume that the covariance matrix of the errors is correctly specified.
 The condition number is large, 2.6e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
``````

Fourth, we fit constant or intercept only linear regression using `ols` function, store its results within `lr1` object, estimate multiple linear regression ANOVA table using `anova_lm` function, store its results within `anova` object and print them. Within `ols` function, parameter `formula="price ~ 1"` fits constant or intercept only linear regression with house price as dependent variable because constant or intercept is a column of ones. Within `anova_lm` function, parameters `test="F"` does an F-test and `typ="I"` does ANOVA Type I test. Notice that `anova_lm` function parameters `test="F"` and `typ="I"` were only included as educational examples which can be modified according to your needs.

``````In :
lr1 = smf.ols(formula="price ~ 1", data=houseprices).fit()
anova = sm.stats.anova_lm(lr1, mlr, test="F", typ="I")
print(anova)
``````
``````Out :
df_resid           ssr  df_diff       ss_diff           F        Pr(>F)
0     545.0  3.886028e+11      0.0           NaN         NaN           NaN
1     543.0  2.447151e+11      2.0  1.438877e+11  159.636705  2.954867e-55
``````
``````   df_resid           ssr  df_diff       ss_diff           F        Pr(>F)
0    df_tot        ss_tot
1    df_res        ss_res   df_reg        ss_reg      f_stat        f_pval``````

`Table 1. Analysis of Variance Table Output Description.`

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

 Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

 statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

+