Skip to content

Homogeneity of Regression Slopes: Dummy Variables in R

Last Update: February 21, 2022

Homogeneity of Regression Slopes: Dummy Variables in R can be done using lmtest package waldtest function for evaluating whether linear regression intercept and slopes are homogeneous across populations. Main parameters within waldtest function are object with restricted and unrestricted linear regression lm objects, and test with string specifying whether to do an F-test or a chi-square test.

As example, we can do homogeneity Wald test from unrestricted multiple linear regression of house prices explained by its lot size, number of bedrooms and air conditioning as dummy independent variable using data included within AER package HousePrices object [1].

First, we load packages AER for data and lmtest for Wald test [2].

In [1]:
library(AER)
library(lmtest)

Second, we create HousePrices data object from AER package using data function and print first six rows, first three columns and tenth column of data using head function to view data.frame structure.

In [2]:
data(HousePrices)
head(HousePrices[, c(1:3, 10)])
Out [2]:
  price lotsize bedrooms aircon
1 42000    5850        3     no
2 38500    4000        2     no
3 49500    3060        3     no
4 60500    6650        3     no
5 61000    6360        2     no
6 66000    4160        3    yes

Third, we fit restricted multiple linear regression using lm function and store results within mlr1 object. Within lm function, parameter formula = price ~ lotsize + bedrooms fits restricted model where house price is explained by its lot size and number of bedrooms.

In [3]:
mlr1 <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)

Fourth, as example again, we fit unrestricted multiple linear regression using lm function, store results within mlr2 object and print mlr2 object summary results using summary.lm function. Within lm function, parameter formula = price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon fits unrestricted model where house price is explained by its lot size, number of bedrooms and air conditioning as dummy independent variable. Notice that lm function parameter formula can also be formula = price ~ lotsize*aircon + bedrooms*aircon because it automatically includes lotsize, bedrooms, aircon individual independent variables and their lotsize*aircon, bedrooms*aircon products within model equation. Also, notice that lm function automatically converts aircon variable yes category into 1 numeric value and no category into 0 numeric value. Additionally, notice that aircon dummy independent variable was only included as educational example which can be modified according to your needs.

In [4]:
mlr2 <- lm(formula = price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon, data = HousePrices)
summary.lm(mlr2)
Out [4]:
Call:
lm(formula = price ~ lotsize + bedrooms + aircon + lotsize * 
    aircon + bedrooms * aircon, data = HousePrices)

Residuals:
   Min     1Q Median     3Q    Max 
-67843 -12577  -1124   9250  90491 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)         1.536e+04  4.263e+03   3.603 0.000344 ***
lotsize             4.621e+00  4.660e-01   9.915  < 2e-16 ***
bedrooms            7.709e+03  1.326e+03   5.813 1.05e-08 ***
airconyes          -1.423e+04  9.434e+03  -1.509 0.131932    
lotsize:airconyes   2.438e+00  8.824e-01   2.763 0.005921 ** 
bedrooms:airconyes  6.125e+03  2.661e+03   2.302 0.021731 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19370 on 540 degrees of freedom
Multiple R-squared:  0.4785,	Adjusted R-squared:  0.4737 
F-statistic: 99.09 on 5 and 540 DF,  p-value: < 2.2e-16

Fifth, we do Wald test using waldtest function. Within waldtest function, parameters object = mlr1, mlr2 includes restricted mlr1 and unrestricted mlr2 models results, and test = "F" includes string to do an F-test. Notice that mlr1, mlr2 models and waldtest function parameter test = "F" were only included as educational examples which can be modified according to your needs.

In [5]:
waldtest(object = mlr1, mlr2, test = "F")
Out [5]:
Wald test

Model 1: price ~ lotsize + bedrooms
Model 2: price ~ lotsize + bedrooms + aircon + lotsize * aircon + bedrooms * 
    aircon
  Res.Df Df     F    Pr(>F)    
1    543                       
2    540  3 37.35 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in R Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

lmtest R Package: Achim Zeileis and Torsten Hothorn. (2002). Diagnostic Checking in Regression Relationships. R News, 2 (3): 7-10.

My online courses are closed for enrollment.
+