# Heteroskedasticity: Breusch-Pagan and White Tests in Python

Last Update: February 21, 2022

Heteroskedasticity: Breusch-Pagan Test in Python can be done using `statsmodels` package `het_breuschpagan` function found within `statsmodels.stats.diagnostic` module for evaluating whether linear regression independent variables explain its errors variance. Main parameters within `het_breuschpagan` function are `resid` with model residuals and `exog_het` with independent variables for explaining model errors variance.

Heteroskedasticity: White Test (Cross Terms) in Python can be done using `statsmodels` package `het_white` function found within `statsmodels.stats.diagnostic` module for evaluating whether linear regression independent variables, squared independent variables and independent variables products explain its errors variance. Main parameters within `het_white` function are `resid` with model residuals and `exog` with independent variables for explaining model errors variance. Within `exog` parameter, model squared independent variables and independent variables products are automatically included in test auxiliary regression. Notice that if model independent variables products or cross terms are included in auxiliary regression, test evaluates heteroskedasticity and model equation specification.

As example, we can do Breusch-Pagan and White (cross terms) tests from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within `AER` R package `HousePrices` object .

First, we import `statsmodels` package for data downloading, multiple linear regression fitting, adding constant to independent variables object, Breusch-Pagan and White tests .

``````In :
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.tools.tools as smt
import statsmodels.stats.diagnostic as smd``````

Second, we create `houseprices` data object using `get_rdataset` function and display first five rows and three columns of data using `print` function and `head` data frame method to view its structure.

``````In :
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
``````Out :
price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2``````

Third, we fit multiple linear regression with `ols` function using variables within `houseprices` data object and store results within `mlr` object. Within `ols` function, parameter `formula="price ~ lotsize + bedrooms"` fits model where house price is explained by its lot size and number of bedrooms.

``````In :
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()``````

Fourth, we create independent variables object, add constant to it at first column with `add_constant` function and print first five rows of data frame to view its structure.

``````In :
ivar = houseprices.iloc[:, 1:3]
``````Out :
const  lotsize  bedrooms
0    1.0     5850         3
1    1.0     4000         2
2    1.0     3060         3
3    1.0     6650         3
4    1.0     6360         2``````

Fifth, we do Breusch-Pagan test using `het_ breuschpagan` function, store results within `bptest` object and print its `lm` Lagrange multiplier test statistic and `lm_pvalue` Lagrange multiplier test p-value results. Within `het_ breuschpagan` function, parameters `resid=mlr.resid` includes `mlr` model residuals and `exog_het=ivarc` includes independent variables for explaining model errors variance with added constant `ivarc` object.

``````In :
bptest = smd.het_breuschpagan(resid=mlr.resid, exog_het=ivarc)
print("lm:", bptest, "lm_pvalue:", bptest)``````
``````Out :
lm: 66.22180390630272 lm_pvalue: 4.169826556412853e-15``````

Sixth, we do White test (cross terms) using `het_white` function, store results within `wtest` object and print its `lm` Lagrange multiplier test statistic and `lm_pvalue` Lagrange multiplier test p-value results. Within `het_white` function, parameters `resid=mlr.resid` includes `mlr` model residuals and `exog=ivarc` includes independent variables for explaining model errors variance with added constant `ivarc` object.

``````In :
wtest = smd.het_white(resid=mlr.resid, exog=ivarc)
print("lm:", wtest, "lm_pvalue:", wtest)``````
``````Out :
lm: 67.32394020191877 lm_pvalue: 3.6903715066545815e-13``````

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

 Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

 statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

+