Skip to content

Heteroskedasticity: Breusch-Pagan and White Tests in Python

Last Update: February 21, 2022

Heteroskedasticity: Breusch-Pagan Test in Python can be done using statsmodels package het_breuschpagan function found within statsmodels.stats.diagnostic module for evaluating whether linear regression independent variables explain its errors variance. Main parameters within het_breuschpagan function are resid with model residuals and exog_het with independent variables for explaining model errors variance.

Heteroskedasticity: White Test (Cross Terms) in Python can be done using statsmodels package het_white function found within statsmodels.stats.diagnostic module for evaluating whether linear regression independent variables, squared independent variables and independent variables products explain its errors variance. Main parameters within het_white function are resid with model residuals and exog with independent variables for explaining model errors variance. Within exog parameter, model squared independent variables and independent variables products are automatically included in test auxiliary regression. Notice that if model independent variables products or cross terms are included in auxiliary regression, test evaluates heteroskedasticity and model equation specification.

As example, we can do Breusch-Pagan and White (cross terms) tests from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER R package HousePrices object [1].

First, we import statsmodels package for data downloading, multiple linear regression fitting, adding constant to independent variables object, Breusch-Pagan and White tests [2].

In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.tools.tools as smt
import statsmodels.stats.diagnostic as smd

Second, we create houseprices data object using get_rdataset function and display first five rows and three columns of data using print function and head data frame method to view its structure.

In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
     price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2

Third, we fit multiple linear regression with ols function using variables within houseprices data object and store results within mlr object. Within ols function, parameter formula="price ~ lotsize + bedrooms" fits model where house price is explained by its lot size and number of bedrooms.

In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()

Fourth, we create independent variables object, add constant to it at first column with add_constant function and print first five rows of data frame to view its structure.

In [4]:
ivar = houseprices.iloc[:, 1:3]
ivarc = smt.add_constant(data=ivar, prepend=True)
print(ivarc.head())
Out [4]:
   const  lotsize  bedrooms
0    1.0     5850         3
1    1.0     4000         2
2    1.0     3060         3
3    1.0     6650         3
4    1.0     6360         2

Fifth, we do Breusch-Pagan test using het_ breuschpagan function, store results within bptest object and print its lm Lagrange multiplier test statistic and lm_pvalue Lagrange multiplier test p-value results. Within het_ breuschpagan function, parameters resid=mlr.resid includes mlr model residuals and exog_het=ivarc includes independent variables for explaining model errors variance with added constant ivarc object.

In [5]:
bptest = smd.het_breuschpagan(resid=mlr.resid, exog_het=ivarc)
print("lm:", bptest[0], "lm_pvalue:", bptest[1])
Out [5]:
lm: 66.22180390630272 lm_pvalue: 4.169826556412853e-15

Sixth, we do White test (cross terms) using het_white function, store results within wtest object and print its lm Lagrange multiplier test statistic and lm_pvalue Lagrange multiplier test p-value results. Within het_white function, parameters resid=mlr.resid includes mlr model residuals and exog=ivarc includes independent variables for explaining model errors variance with added constant ivarc object.

In [6]:
wtest = smd.het_white(resid=mlr.resid, exog=ivarc)
print("lm:", wtest[0], "lm_pvalue:", wtest[1])
Out [6]:
lm: 67.32394020191877 lm_pvalue: 3.6903715066545815e-13

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

My online courses are closed for enrollment.
+