Last Update: February 21, 2022
Heteroskedasticity: Breusch-Pagan Test in Python can be done using statsmodels
package het_breuschpagan
function found within statsmodels.stats.diagnostic
module for evaluating whether linear regression independent variables explain its errors variance. Main parameters within het_breuschpagan
function are resid
with model residuals and exog_het
with independent variables for explaining model errors variance.
Heteroskedasticity: White Test (Cross Terms) in Python can be done using statsmodels
package het_white
function found within statsmodels.stats.diagnostic
module for evaluating whether linear regression independent variables, squared independent variables and independent variables products explain its errors variance. Main parameters within het_white
function are resid
with model residuals and exog
with independent variables for explaining model errors variance. Within exog
parameter, model squared independent variables and independent variables products are automatically included in test auxiliary regression. Notice that if model independent variables products or cross terms are included in auxiliary regression, test evaluates heteroskedasticity and model equation specification.
As example, we can do Breusch-Pagan and White (cross terms) tests from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
R package HousePrices
object [1].
First, we import statsmodels
package for data downloading, multiple linear regression fitting, adding constant to independent variables object, Breusch-Pagan and White tests [2].
In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.tools.tools as smt
import statsmodels.stats.diagnostic as smd
Second, we create houseprices
data object using get_rdataset
function and display first five rows and three columns of data using print
function and head
data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
price lotsize bedrooms
0 42000.0 5850 3
1 38500.0 4000 2
2 49500.0 3060 3
3 60500.0 6650 3
4 61000.0 6360 2
Third, we fit multiple linear regression with ols
function using variables within houseprices
data object and store results within mlr
object. Within ols
function, parameter formula="price ~ lotsize + bedrooms"
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
Fourth, we create independent variables object, add constant to it at first column with add_constant
function and print first five rows of data frame to view its structure.
In [4]:
ivar = houseprices.iloc[:, 1:3]
ivarc = smt.add_constant(data=ivar, prepend=True)
print(ivarc.head())
Out [4]:
const lotsize bedrooms
0 1.0 5850 3
1 1.0 4000 2
2 1.0 3060 3
3 1.0 6650 3
4 1.0 6360 2
Fifth, we do Breusch-Pagan test using het_ breuschpagan
function, store results within bptest
object and print its lm
Lagrange multiplier test statistic and lm_pvalue
Lagrange multiplier test p-value results. Within het_ breuschpagan
function, parameters resid=mlr.resid
includes mlr
model residuals and exog_het=ivarc
includes independent variables for explaining model errors variance with added constant ivarc
object.
In [5]:
bptest = smd.het_breuschpagan(resid=mlr.resid, exog_het=ivarc)
print("lm:", bptest[0], "lm_pvalue:", bptest[1])
Out [5]:
lm: 66.22180390630272 lm_pvalue: 4.169826556412853e-15
Sixth, we do White test (cross terms) using het_white
function, store results within wtest
object and print its lm
Lagrange multiplier test statistic and lm_pvalue
Lagrange multiplier test p-value results. Within het_white
function, parameters resid=mlr.resid
includes mlr
model residuals and exog=ivarc
includes independent variables for explaining model errors variance with added constant ivarc
object.
In [6]:
wtest = smd.het_white(resid=mlr.resid, exog=ivarc)
print("lm:", wtest[0], "lm_pvalue:", wtest[1])
Out [6]:
lm: 67.32394020191877 lm_pvalue: 3.6903715066545815e-13
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.