Last Update: April 24, 2022
Exogeneity: Wu-Hausman and Sargan Tests in Python can be done using
wu_hausman method and
sargan attributes found within
linearmodels.iv.model module for evaluating whether linear regression independent variables are not correlated with error term (exogenous) and whether instrumental variables are not correlated with second stage least squares linear regression error term (valid instruments). Main parameters within
IV2SLS function are
dependent with model dependent variable,
exog with model exogenous independent variable,
endog with model endogenous independent variable and
instruments with model instrumental variables.
As example, we can do Wu-Hausman, Wu-Hausman (Wooldridge) and Sargan tests from original multiple linear regression of house price explained by its lot size and number of bedrooms with whether house has a driveway and number of garage places as instrumental variable using data included within
AER R package
HousePrices object .
First, we import packages
statsmodels for data downloading and ordinary least squares original model fitting and
linearmodels for two stage least squares model fitting, Wu-Hausman, Wu-Hausman (Wooldridge) and Sargan tests .
In : import statsmodels.api as sm import statsmodels.formula.api as smf import linearmodels.iv.model as lm
Second, we create
houseprices data object using
get_rdataset function and display first five rows and first three columns together with sixth and eleventh columns of data using
head data frame method to view its structure.
In : houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data print(houseprices.iloc[:, list(range(3)) +  + ].head())
Out : price lotsize bedrooms driveway garage 0 42000.0 5850 3 yes 1 1 38500.0 4000 2 yes 0 2 49500.0 3060 3 yes 0 3 60500.0 6650 3 yes 0 4 61000.0 6360 2 yes 0
Third, we fit original model with
ols function using variables within
houseprices data object and store outcome within
mlr1 object. Within
ols function, parameter
formula = “price ~ lotsize + bedrooms” fits model where house price is explained by its lot size and number of bedrooms.
In : mlr1 = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
Fourth, we create
mdatac model data object and add a constant column using
add_constant function. Within
add_constant function, parameters
houseprices data object and
prepend=False includes logical value to add constant at last column of
mdatac data object. Then, we fit two stage least squares model with
IV2SLS function using variables within
mdatac data object and store outcome within
mlr2 object. Within
IV2SLS function, parameters
dependent=mdatac["price"] includes model house price dependent variable,
exog=mdatac[["const", "bedrooms"]] includes model number of bedrooms exogenous independent variable,
endog=mdatac["lotsize"] includes model lot size endogenous independent variable,
instruments=mdatac[["driveway", "garage"]] includes model whether house has a driveway and number of garage places instrumental variables,
cov_type="homoskedastic" includes model homoskedastic variance covariance matrix estimation and
debiased=True includes logical value to adjust model variance covariance matrix estimation for degrees of freedom. Notice that
IV2SLS function parameters
debiased=True were only included as educational examples which can be modified according to your needs. Also, notice that doing stage by stage instead of simultaneous stages estimation of two stage least squares model with
ols function would estimate correct coefficients but incorrect standard errors and F-statistic. Additionally, notice that two stage least squares
mlr2 model estimation assumes errors are homoskedastic.
In : mdatac = sm.add_constant(data=houseprices, prepend=False) mlr2 = lm.IV2SLS(dependent=mdatac["price"], exog=mdatac[["const", "bedrooms"]], endog=mdatac["lotsize"], instruments=mdatac[["driveway", "garage"]]).fit(cov_type="homoskedastic", debiased=True)
Fifth, we can print Wu-Hausman test results using
In : print(mlr2.wu_hausman())
Out : Wu-Hausman test of exogeneity H0: All endogenous variables are exogenous Statistic: 50.9308 P-value: 0.0000 Distributed: F(1,542)
Sixth, we can print Wu-Hausman (Wooldridge) test results using
In : print(mlr2.wooldridge_regression)
Out : Wooldridge's regression test of exogeneity H0: Endogenous variables are exogenous Statistic: 50.9046 P-value: 0.0000 Distributed: chi2(1)
Seventh, we can print Sargan test results using
In : print(mlr2.sargan)
Out : Sargan's test of overidentification H0: The model is not overidentified. Statistic: 0.0477 P-value: 0.8271 Distributed: chi2(1)
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
 Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
 statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python”. Proceedings of the 9th Python in Science Conference.
linearmodels Python package: Kevin Sheppard. (2021). “Linear (regression) models for Python. Extends statsmodels with Panel regression, instrumental variable estimators, system estimators and models for estimating asset prices”. Python package version 4.25.