Skip to content

Instrumental Variables: Two Stage Least Squares

Last Update: March 24, 2022

Instrumental Variables: Two Stage Least Squares estimation is used when linear regression independent variables are correlated with error term (endogenous).

As example, we can fit a three-variable original multiple linear regression with formula \hat{y}_{(o)i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1). Original linear regression fitted values \hat{y}_{(o)i} are the estimated y_{i} values, x_{1} is the endogenous independent variable (assumed correlated with original linear regression error term), x_{2} is the exogenous independent variable (assumed not correlated with original linear regression error term). Then, we can fit first stage least squares multiple linear regression with formula \hat{x}_{1i}=\hat{\gamma}_{0}+\hat{\gamma}_{1}x_{2i}+\hat{\gamma}_{2}z_{1i}+\hat{\gamma}_{3}z_{2i}\;(2). First stage least squares linear regression fitted values \hat{x}_{1i} are the estimated x_{1i} values, z_{1} and z_{2} are the instrumental variables. Next, we can fit second stage least squares multiple linear regression with formula \hat{y}_{(f)i}=\hat{\delta}_{0}+\hat{\delta}_{1}\hat{x}_{1i}+\hat{\delta}_{2}x_{2i}\;(3). Second stage least squares linear regression fitted values \hat{y}_{(f)i} are the estimated y_{i} values. Notice that instrumental variables are not included within original linear regression, they are assumed correlated with endogenous independent variable and they are assumed not correlated with second stage least squares linear regression error term. Also, notice that stage by stage instead of simultaneous stages estimation of two stage least squares linear regression would estimate correct coefficients but incorrect standard errors and F-statistic. Additionally, notice that two stage least squares linear regression estimation assumes errors are homoskedastic unless heteroskedasticity consistent estimator is used.

Below, we find an example of estimated coefficients comparison between original multiple linear regression of house price explained by its lot size and number of bedrooms and second stage least squares multiple linear regression of house price explained by its lot size first stage least squares multiple linear regression fitted values and number of bedrooms using whether house has a driveway and number of garage places as instrumental variables [1].

Table 1. Microsoft Excel® estimated coefficients from original multiple linear regression of house price explained by its lot size and number of bedrooms.
Table 2. Microsoft Excel® estimated coefficients from second stage least squares multiple linear regression of house price explained by its lot size first stage least squares multiple linear regression fitted values and number of bedrooms using whether house has a driveway and number of garage places as instrumental variables.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

My online courses are closed for enrollment.
+