Skip to content

Exogeneity: Wu-Hausman and Sargan Tests in R

Last Update: March 24, 2022

Exogeneity: Wu-Hausman and Sargan Tests in R can be done using AER package ivreg, summary for ivreg functions for evaluating whether linear regression independent variables are not correlated with error term (exogenous) and whether instrumental variables are not correlated with second stage least squares linear regression error term (valid instruments). Main parameters within ivreg function are formula with y ~ x1 + x2 | x2 + z1 + z2 original model with x1 endogenous independent variable and x2 exogenous independent variable followed by first stage least squares model with x2 exogenous independent variable, z1 and z2 instrumental variables description and data with data.frame object including models variables. Main parameters within summary for ivreg function are object with ivreg function instrumental variables and two stage least squares estimation and diagnostics with logical value to print Wu-Hausman (Wooldridge) and Sargan tests results.

As example, we can do Wu-Hausman (Wooldridge) and Sargan tests from original multiple linear regression of house price explained by its lot size and number of bedrooms with whether house has a driveway and number of garage places as instrumental variables using data included within AER package HousePrices object [1].

First, we load package AER for data, two stage least squares estimation, Wu-Hausman (Wooldridge) and Sargan tests [2].

In [1]:
library(AER)

Second, we create HousePrices data object from AER package using data function and print first six rows, first three columns together with sixth and eleventh columns of data using head function to view data.frame structure.

In [2]:
data(HousePrices)
head(HousePrices[, c(1:3, 6, 11)])
Out [2]:
  price lotsize bedrooms driveway garage
1 42000    5850        3      yes      1
2 38500    4000        2      yes      0
3 49500    3060        3      yes      0
4 60500    6650        3      yes      0
5 61000    6360        2      yes      0
6 66000    4160        3      yes      0

Third, we fit original model with lm function using variables within HousePrices data object and store outcome within mlr1 object. Within lm function, parameter formula = price ~ lotsize + bedrooms fits original model where house price is explained by its lot size and number of bedrooms.

In [3]:
mlr1 <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)

Fourth, we fit two stage least squares model with ivreg function using variables within HousePrices data object and store outcome within mlr2 object. Within ivreg function, parameter formula = price ~ lotsize + bedrooms | bedrooms + driveway + garage fits original model where house price is explained by its lot size endogenous independent variable and number of bedrooms exogenous independent variable followed by first stage least squares model number of bedrooms exogenous independent variable, whether house has a driveway and number of garage places instrumental variables. Notice that doing stage by stage instead of simultaneous stages estimation of two stage least squares model with lm function would estimate correct coefficients but incorrect standard errors and F-statistic.

In [4]:
mlr2 <- ivreg(formula = price ~ lotsize + bedrooms | bedrooms + driveway + garage, data = HousePrices)

Fifth, we do Wu-Hausman (Wooldridge) and Sargan tests using summary for ivreg function. Within summary for ivreg function, parameters object = mlr2 includes mlr2 model results and diagnostics = TRUE includes logical value to print Wu-Hausman (Wooldridge) and Sargan tests results. Notice that two stage least squares mlr2 model estimation assumes errors are homoskedastic unless heteroskedasticity consistent variance covariance matrix estimation is used within summary for ivreg function.

In [5]:
summary(object = mlr2, diagnostics = TRUE)
Out [5]:
Call:
ivreg(formula = price ~ lotsize + bedrooms | bedrooms + driveway + 
    garage, data = HousePrices)

Residuals:
    Min      1Q  Median      3Q     Max 
-115962  -11520    2287   14482   85515 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -19130.16    6540.67  -2.925  0.00359 ** 
lotsize         12.52       1.24  10.096  < 2e-16 ***
bedrooms      7680.13    1574.09   4.879  1.4e-06 ***

Diagnostic tests:
                 df1 df2 statistic  p-value    
Weak instruments   2 542    54.403  < 2e-16 ***
Wu-Hausman         1 542    50.905 3.12e-12 ***
Sargan             1  NA     0.048    0.827    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25370 on 543 degrees of freedom
Multiple R-Squared: 0.1009,	Adjusted R-squared: 0.09763 
Wald test: 91.52 on 2 and 543 DF,  p-value: < 2.2e-16

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in R Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

My online courses are closed for enrollment.
+