Last Update: February 21, 2022
Homogeneity of Regression Slopes: Dummy Variables in R can be done using lmtest
package waldtest
function for evaluating whether linear regression intercept and slopes are homogeneous across populations. Main parameters within waldtest
function are object
with restricted and unrestricted linear regression lm
objects, and test
with string specifying whether to do an F-test or a chi-square test.
As example, we can do homogeneity Wald test from unrestricted multiple linear regression of house prices explained by its lot size, number of bedrooms and air conditioning as dummy independent variable using data included within AER
package HousePrices
object [1].
First, we load packages AER
for data and lmtest
for Wald test [2].
In [1]:
library(AER)
library(lmtest)
Second, we create HousePrices
data object from AER
package using data
function and print first six rows, first three columns and tenth column of data using head
function to view data.frame
structure.
In [2]:
data(HousePrices)
head(HousePrices[, c(1:3, 10)])
Out [2]:
price lotsize bedrooms aircon
1 42000 5850 3 no
2 38500 4000 2 no
3 49500 3060 3 no
4 60500 6650 3 no
5 61000 6360 2 no
6 66000 4160 3 yes
Third, we fit restricted multiple linear regression using lm
function and store results within mlr1
object. Within lm
function, parameter formula = price ~ lotsize + bedrooms
fits restricted model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr1 <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Fourth, as example again, we fit unrestricted multiple linear regression using lm
function, store results within mlr2
object and print mlr2
object summary results using summary.lm
function. Within lm
function, parameter formula = price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon
fits unrestricted model where house price is explained by its lot size, number of bedrooms and air conditioning as dummy independent variable. Notice that lm
function parameter formula
can also be formula = price ~ lotsize*aircon + bedrooms*aircon
because it automatically includes lotsize
, bedrooms
, aircon
individual independent variables and their lotsize*aircon
, bedrooms*aircon
products within model equation. Also, notice that lm
function automatically converts aircon
variable yes
category into 1
numeric value and no
category into 0
numeric value. Additionally, notice that aircon
dummy independent variable was only included as educational example which can be modified according to your needs.
In [4]:
mlr2 <- lm(formula = price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon, data = HousePrices)
summary.lm(mlr2)
Out [4]:
Call:
lm(formula = price ~ lotsize + bedrooms + aircon + lotsize *
aircon + bedrooms * aircon, data = HousePrices)
Residuals:
Min 1Q Median 3Q Max
-67843 -12577 -1124 9250 90491
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.536e+04 4.263e+03 3.603 0.000344 ***
lotsize 4.621e+00 4.660e-01 9.915 < 2e-16 ***
bedrooms 7.709e+03 1.326e+03 5.813 1.05e-08 ***
airconyes -1.423e+04 9.434e+03 -1.509 0.131932
lotsize:airconyes 2.438e+00 8.824e-01 2.763 0.005921 **
bedrooms:airconyes 6.125e+03 2.661e+03 2.302 0.021731 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19370 on 540 degrees of freedom
Multiple R-squared: 0.4785, Adjusted R-squared: 0.4737
F-statistic: 99.09 on 5 and 540 DF, p-value: < 2.2e-16
Fifth, we do Wald test using waldtest
function. Within waldtest
function, parameters object = mlr1, mlr2
includes restricted mlr1
and unrestricted mlr2
models results, and test = "F"
includes string to do an F-test. Notice that mlr1
, mlr2
models and waldtest
function parameter test = "F"
were only included as educational examples which can be modified according to your needs.
In [5]:
waldtest(object = mlr1, mlr2, test = "F")
Out [5]:
Wald test
Model 1: price ~ lotsize + bedrooms
Model 2: price ~ lotsize + bedrooms + aircon + lotsize * aircon + bedrooms *
aircon
Res.Df Df F Pr(>F)
1 543
2 540 3 37.35 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.
lmtest R Package: Achim Zeileis and Torsten Hothorn. (2002). Diagnostic Checking in Regression Relationships. R News, 2 (3): 7-10.