Last Update: February 21, 2022
Omitted Variable Bias: Wald Test in Python can be done using statsmodels
package wald_test
function found within statsmodels.formula.api
module for evaluating whether linear regression omitted independent variables explain dependent variable. Main parameters within wald_test
function are r_matrix
with omitted independent variables null hypothesis string and use_f
with logical value on whether an F-test or chi-square test should be done.
As example, we can do number of bathrooms omitted variable Wald test from unrestricted multiple linear regression of house price explained by its lot size, number of bedrooms and bathrooms using data included within AER
R package HousePrices
object [1].
First, we import statsmodels
package for data downloading, multiple linear regression fitting and Wald test [2].
In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
Second, we create houseprices
data object using get_rdataset
function and display first five rows and four columns of data using print
function and head
data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:4].head())
Out [2]:
price lotsize bedrooms bathrooms
0 42000.0 5850 3 1
1 38500.0 4000 2 1
2 49500.0 3060 3 1
3 60500.0 6650 3 1
4 61000.0 6360 2 1
Third, we fit unrestricted multiple linear regression with ols
function using variables within houseprices
data object and store results within mlr
object. Within ols
function, parameter formula="price ~ lotsize + bedrooms + bathrooms"
fits unrestricted model where house price is explained by its lot size, number of bedrooms and bathrooms.
In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms + bathrooms", data=houseprices).fit()
Fourth, as example again, we do Wald test using wald_test
function, store results within waldtest
object and print its results. Within wald_test
function, parameters r_matrix="bathrooms = 0"
includes number of bathrooms omitted independent variable null hypothesis string and use_f=True
does F-test. Notice that unrestricted mlr
model results and wald_test
function parameter use_f=True
were only included as educational examples which can be modified according to your needs.
In [4]:
waldtest = mlr.wald_test(r_matrix="bathrooms = 0", use_f=True)
print(waldtest)
Out [4]:
<F test: F=array([[122.41268574]]), p=8.544987755751257e-26, df_denom=542, df_num=1>
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.