Skip to content

Multicollinearity: Variance Inflation Factor in R

Last Update: February 21, 2022

Multicollinearity in R can be tested using car package vif function for estimating multiple linear regression independent variables variance inflation factors. Main parameter within vif function is mod with previously fitted lm model. Independent variables variance inflation factors can also be estimated as main diagonal values from their inverse correlation matrix using MASS package ginv function. Main parameter within ginv function is X with independent variables previously estimated correlation matrix using stats package cor function.

As example, we can test multicollinearity of independent variables from multiple linear regression of house price explained by its lot size, number of bedrooms, bathrooms and stories using data included within AER package HousePrices object [1].

First, we load packages AER for data, car for estimating variance inflation factors, MASS for estimating inverse correlation matrix and corrplot for inverse correlation matrix chart [2].

In [1]:
library(AER)
library(car)
library(MASS)
library(corrplot)

Second, we create HousePrices data object from AER package using data function and print first six rows and five columns of data using head function to view data.frame structure.

In [2]:
data(HousePrices)
head(HousePrices[, 1:5])
Out [2]:
  price lotsize bedrooms bathrooms stories
1 42000    5850        3         1       2
2 38500    4000        2         1       1
3 49500    3060        3         1       1
4 60500    6650        3         1       2
5 61000    6360        2         1       1
6 66000    4160        3         1       1

Third, we can fit multiple linear regression model using lm function and store outcome within mlr object. Within lm function, parameter formula = price ~ lotsize + bedrooms + bathrooms + stories fits model where house price is explained by its lot size, number of bedrooms, bathrooms and stories. Then, we can print independent variables estimated variance inflation factors using vif function. Within vif function, parameter mod = mlr includes previously fitted lm model.

In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms + bathrooms + stories, data = HousePrices)
vif(mod = mlr)
Out [3]:
  lotsize  bedrooms bathrooms   stories 
 1.047054  1.310851  1.239203  1.251087

Fourth, we can create independent variables data frame subset and store it within ivar object. Next, we can also print independent variables estimated variance inflation factors as main diagonal values from their inverse correlation matrix using ginv function and store outcome within ivaricor object. Within ginv function, parameter X = cor(ivar) includes independent variables estimated correlation matrix using cor function.

In [4]:
ivar <- HousePrices[, 2:5]
ivaricor <- ginv(X = cor(ivar))
colnames(ivaricor) <- colnames(ivar)
rownames(ivaricor) <- colnames(ivar)
ivaricor
Out [4]:
               lotsize    bedrooms  bathrooms      stories
lotsize    1.047054041 -0.09909201 -0.1683001  0.007354973
bedrooms  -0.099092014  1.31085130 -0.3353444 -0.417827752
bathrooms -0.168300120 -0.33534441  1.2392031 -0.250688885
stories    0.007354973 -0.41782775 -0.2506889  1.251086952

Fifth, we can additionally visualize independent variables estimated variance inflation factors as main diagonal values from their inverse correlation matrix chart using corrplot package corrplot function. Within corrplot function, parameters corr = ivaricor includes matrix to visualize, method = "number" includes visualization method to be used and is.corr = FALSE includes logical value that input matrix is an inverse correlation matrix and not a correlation matrix.

In [5]:
corrplot(corr = ivaricor, method = "number", is.corr = FALSE)
Out [5]:
Figure 1. Multiple linear regression independent variables inverse correlation matrix chart.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in R Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

car R Package: John Fox and Sanford Weisberg. (2019). An R Companion to Applied Regression. Third Edition. Sage, Thousand Oaks, CA.

MASS R Package: W. N. Venables and B. D. Ripley. (2002). Modern Applied Statistics with S. Fourth Edition. Springer, New York.

corrplot R Package: Taiyun Wei and Viliam Simko. (2021). R package ‘corrplot’: Visualization of a Correlation Matrix. Version 0.90.

My online courses are closed for enrollment.
+