Skip to content

Heteroskedasticity: Breusch-Pagan and White Tests in R

Last Update: February 21, 2022

Heteroskedasticity: Breusch-Pagan Test in R can be done using lmtest package bptest function for evaluating whether linear regression independent variables explain its errors variance. Main parameters within bptest function are formula with lm model to be tested and varformula with formula describing independent variables for explaining model errors variance.

Heteroskedasticity: White Test in R can also be done using lmtest package bptest function for evaluating whether linear regression independent variables and squared independent variables explain its errors variance. Main parameters within bptest function are formula with lm model to be tested and varformula with formula describing independent variables and squared independent variables for explaining model errors variance.

As example, we can do Breusch-Pagan, White (no cross terms) and White (cross terms) tests from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER package HousePrices object [1].

First, we load packages AER for data and lmtest for Breusch-Pagan and White tests [2].

In [1]:
library(AER)
library(lmtest)

Second, we create HousePrices data object from AER package using data function and print first six rows and three columns of data using head function to view data.frame structure.

In [2]:
data(HousePrices)
head(HousePrices[, 1:3])
Out [2]:
  price lotsize bedrooms
1 42000    5850        3
2 38500    4000        2
3 49500    3060        3
4 60500    6650        3
5 61000    6360        2
6 66000    4160        3

Third, we fit multiple linear regression using lm function and store results within mlr object. Within lm function, parameter formula = price ~ lotsize + bedrooms fits model where house price is explained by its lot size and number of bedrooms.

In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)

Fourth, we do Breusch-Pagan test using bptest function. Within bptest function, parameters formula = mlr includes mlr model to be tested and varformula = ~ lotsize + bedrooms includes formula describing independent variables for explaining mlr model errors variance.

In [4]:
bptest(formula = mlr, varformula = ~ lotsize + bedrooms, data = HousePrices)
Out [4]:
	studentized Breusch-Pagan test

data:  mlr
BP = 66.222, df = 2, p-value = 4.17e-15

Fifth, we also do White test (no cross terms) using bptest function. Within bptest function, parameters formula = mlr includes mlr model to be tested and varformula = ~ lotsize + I(lotsize^2) + bedrooms + I(bedrooms^2) includes formula describing independent variables and squared independent variables for explaining mlr model errors variance. Within varformula parameter, I function is used so that ^ operators are inhibited as formula operators and used as arithmetical operators instead. Notice that bptest function prints studentized Breusch-Pagan test title but a White test (no cross terms) is done instead.

In [5]:
bptest(formula = mlr, varformula = ~ lotsize + I(lotsize^2) + bedrooms + I(bedrooms^2), data = HousePrices)
Out [5]:
	studentized Breusch-Pagan test

data:  mlr
BP = 67.253, df = 4, p-value = 8.622e-14

Sixth, we additionally do White test (cross terms) using bptest function. Within bptest function, parameters formula = mlr includes mlr model to be tested and varformula = ~ lotsize + I(lotsize^2) + lotsize*bedrooms + bedrooms + I(bedrooms^2) includes formula describing independent variables, squared independent variables and independent variables product for explaining mlr model errors variance. Within varformula parameter, I function is used so that ^ operators are inhibited as formula operators and used as arithmetical operators instead. Notice that bptest function prints studentized Breusch-Pagan test title but a White test (cross terms) is done instead. Also, notice that White test (cross terms) evaluates heteroskedasticity and model equation specification.

In [6]:
bptest(formula = mlr, varformula = ~ lotsize + I(lotsize^2) + lotsize*bedrooms + bedrooms + I(bedrooms^2), data = HousePrices)
Out [6]:
	studentized Breusch-Pagan test

data:  mlr
BP = 67.324, df = 5, p-value = 3.69e-13

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in R Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

lmtest R Package: Achim Zeileis and Torsten Hothorn. (2002). Diagnostic Checking in Regression Relationships. R News, 2 (3): 7-10.

My online courses are closed for enrollment.
+