Skip to content

Multicollinearity: Variance Inflation Factor

Last Update: February 21, 2022

Multicollinearity is when two or more linear regression independent variables x_{1},...,x_{p} are highly correlated which complicates isolating their individual explanatory relationship with dependent variable y. This can be tested through model independent variables estimated variance inflation factors vif_{j}. If independent variable j estimated variance inflation factor vif_{j} is between five and ten then independent variable might be highly correlated. And, if independent variable j estimated variance inflation factor vif_{j} is greater than ten then independent variable is highly correlated.

As example, we can fit a five-variable multiple linear regression with formula \hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\hat{\beta}_{3}x_{3i}+\hat{\beta}_{4}x_{4i}\;(1). Then, as example again, we can estimate independent variable x_{1} variance inflation factor individually with formula vif_{1}=\frac{1}{1-r_{1}^2}\;(2). Independent variable x_{1} variance inflation factor vif_{1} is equal to one divided by one minus coefficient of multiple determination r_{1}^2 from multiple linear regression with formula \hat{x}_{1i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{2i}+\hat{\beta}_{2}x_{3i}+\hat{\beta}_{3}x_{4i}\;(3). Notice that multiple linear regression (3) only includes independent variables and independent variable x_{1} for which variance inflation factor vif_{1} is estimated becomes its dependent variable while the others remain as independent variables.

Below, we find an example of independent variables individually estimated variance inflation factors from multiple linear regression of house price explained by its lot size, number of bedrooms, bathrooms and stories [1].

Table 1. Microsoft Excel® independent variables individually estimated variance inflation factors from multiple linear regression of house price explained by its lot size, number of bedrooms, bathrooms and stories.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

My online courses are closed for enrollment.
+