Last Update: March 24, 2022
Instrumental Variables: Two Stage Least Squares in R can be done using AER
package ivreg
function for estimating linear regression with independent variables which are correlated with error term (endogenous). Main parameters within ivreg
function are formula
with y ~ x1 + x2 | x2 + z1 + z2
original model with x1
endogenous independent variable and x2
exogenous independent variable followed by first stage least squares model with x2
exogenous independent variable, z1
and z2
instrumental variables description and data
with data.frame
object including models variables.
As example, we can compare estimated coefficients tables and F-statistics from original multiple linear regression of house price explained by its lot size and number of bedrooms and second stage least squares multiple linear regression of house price explained by its lot size first stage multiple linear regression fitted values and number of bedrooms with whether house has a driveway and number of garage places as instrumental variables using data included within AER
package HousePrices
object [1].
First, we load package AER
for data and two stage least squares estimation [2].
In [1]:
library(AER)
Second, we create HousePrices
data object from AER
package using data
function and print first six rows, first three columns together with sixth and eleventh columns of data using head
function to view data.frame
structure.
In [2]:
data(HousePrices)
head(HousePrices[, c(1:3, 6, 11)])
Out [2]:
price lotsize bedrooms driveway garage
1 42000 5850 3 yes 1
2 38500 4000 2 yes 0
3 49500 3060 3 yes 0
4 60500 6650 3 yes 0
5 61000 6360 2 yes 0
6 66000 4160 3 yes 0
Third, we fit original model with lm
function using variables within HousePrices
data object and store outcome within mlr1
object. Within lm
function, parameter formula = price ~ lotsize + bedrooms
fits original model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr1 <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Fourth, we fit two stage least squares model with ivreg
function using variables within HousePrices
data object and store outcome within mlr2
object. Within ivreg
function, parameter formula = price ~ lotsize + bedrooms | bedrooms + driveway + garage
fits original model where house price is explained by its lot size endogenous independent variable and number of bedrooms exogenous independent variable followed by first stage least squares model number of bedrooms exogenous independent variable, whether house has a driveway and number of garage places instrumental variables. Notice that doing stage by stage instead of simultaneous stages estimation of two stage least squares model with lm
function would estimate correct coefficients but incorrect standard errors and F-statistic.
In [4]:
mlr2 <- ivreg(formula = price ~ lotsize + bedrooms | bedrooms + driveway + garage, data = HousePrices)
Fifth, we get mlr1
model summary results with summary
function and store outcome within smlr1
object. Within summary
function, parameter object = mlr1
includes mlr1
model results. Then, we get mlr2
model summary results with summary
function for ivreg
and store outcome within smlr2
object. Within summary
function for ivreg
, parameters object = mlr2
includes mlr2
model results and test = "F"
includes string to do an F-test. Notice that summary
function for ivreg
parameter test = "F"
was only included as educational example which can be modified according to your needs. Also, notice that two stage least squares mlr2
model estimation assumes errors are homoskedastic unless heteroskedasticity consistent variance covariance matrix estimation is used within summary
function for ivreg
.
In [5]:
smlr1 <- summary(object = mlr1)
smlr2 <- summary(object = mlr2, test = "F")
Sixth, we print mlr1
model estimated coefficients table using its coefficients
value.
In [6]:
smlr1$coefficients
Out [6]:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5612.599731 4102.8189131 1.367986 1.718822e-01
lotsize 6.053022 0.4243331 14.264788 1.938847e-39
bedrooms 10567.351501 1247.6764642 8.469625 2.314456e-16
Seventh, we print mlr2
model estimated coefficients table using its coefficients
value.
In [7]:
smlr2$coefficients
Out [7]:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19130.15709 6540.667 -2.924802 3.590757e-03
lotsize 12.51948 1.240 10.096348 4.417073e-22
bedrooms 7680.12883 1574.086 4.879105 1.402506e-06
attr(,"df")
[1] 543
attr(,"nobs")
[1] 546
Eighth, we print mlr1
model F-statistic using its fstatistic
value.
In [8]:
smlr1$fstatistic
Out [8]:
value numdf dendf
159.6367 2.0000 543.0000
Ninth, we print mlr2
model F-statistic using its waldtest
value.
In [9]:
smlr2$waldtest
Out [9]:
[1] 9.151967e+01 5.590784e-35 2.000000e+00 5.430000e+02
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.