# Multicollinearity: Variance Inflation Factor in R

Last Update: February 21, 2022

Multicollinearity in R can be tested using `car` package `vif` function for estimating multiple linear regression independent variables variance inflation factors. Main parameter within `vif` function is `mod` with previously fitted `lm` model. Independent variables variance inflation factors can also be estimated as main diagonal values from their inverse correlation matrix using `MASS` package `ginv` function. Main parameter within `ginv` function is `X` with independent variables previously estimated correlation matrix using `stats` package `cor` function.

As example, we can test multicollinearity of independent variables from multiple linear regression of house price explained by its lot size, number of bedrooms, bathrooms and stories using data included within `AER` package `HousePrices` object [1].

First, we load packages `AER` for data, `car` for estimating variance inflation factors, `MASS` for estimating inverse correlation matrix and `corrplot` for inverse correlation matrix chart [2].

``````In [1]:
library(AER)
library(car)
library(MASS)
library(corrplot)
``````

Second, we create `HousePrices` data object from `AER` package using `data` function and print first six rows and five columns of data using `head` function to view `data.frame` structure.

``````In [2]:
data(HousePrices)
``````
``````Out [2]:
price lotsize bedrooms bathrooms stories
1 42000    5850        3         1       2
2 38500    4000        2         1       1
3 49500    3060        3         1       1
4 60500    6650        3         1       2
5 61000    6360        2         1       1
6 66000    4160        3         1       1
``````

Third, we can fit multiple linear regression model using `lm` function and store outcome within `mlr` object. Within `lm` function, parameter `formula = price ~ lotsize + bedrooms + bathrooms + stories` fits model where house price is explained by its lot size, number of bedrooms, bathrooms and stories. Then, we can print independent variables estimated variance inflation factors using `vif` function. Within `vif` function, parameter `mod = mlr` includes previously fitted `lm` model.

``````In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms + bathrooms + stories, data = HousePrices)
vif(mod = mlr)
``````
``````Out [3]:
lotsize  bedrooms bathrooms   stories
1.047054  1.310851  1.239203  1.251087
``````

Fourth, we can create independent variables data frame subset and store it within `ivar` object. Next, we can also print independent variables estimated variance inflation factors as main diagonal values from their inverse correlation matrix using `ginv` function and store outcome within `ivaricor` object. Within `ginv` function, parameter `X = cor(ivar)` includes independent variables estimated correlation matrix using `cor` function.

``````In [4]:
ivar <- HousePrices[, 2:5]
ivaricor <- ginv(X = cor(ivar))
colnames(ivaricor) <- colnames(ivar)
rownames(ivaricor) <- colnames(ivar)
ivaricor
``````
``````Out [4]:
lotsize    bedrooms  bathrooms      stories
lotsize    1.047054041 -0.09909201 -0.1683001  0.007354973
bedrooms  -0.099092014  1.31085130 -0.3353444 -0.417827752
bathrooms -0.168300120 -0.33534441  1.2392031 -0.250688885
stories    0.007354973 -0.41782775 -0.2506889  1.251086952
``````

Fifth, we can additionally visualize independent variables estimated variance inflation factors as main diagonal values from their inverse correlation matrix chart using `corrplot` package `corrplot` function. Within `corrplot` function, parameters `corr = ivaricor` includes matrix to visualize, `method = "number"` includes visualization method to be used and `is.corr = FALSE` includes logical value that input matrix is an inverse correlation matrix and not a correlation matrix.

``````In [5]:
corrplot(corr = ivaricor, method = "number", is.corr = FALSE)
``````
``Out [5]:``

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in R Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

car R Package: John Fox and Sanford Weisberg. (2019). An R Companion to Applied Regression. Third Edition. Sage, Thousand Oaks, CA.

MASS R Package: W. N. Venables and B. D. Ripley. (2002). Modern Applied Statistics with S. Fourth Edition. Springer, New York.

corrplot R Package: Taiyun Wei and Viliam Simko. (2021). R package ‘corrplot’: Visualization of a Correlation Matrix. Version 0.90.

+