# Homogeneity of Regression Slopes: Dummy Variables in Python

Last Update: February 21, 2022

Homogeneity of Regression Slopes: Dummy Variables in Python can be done using `statsmodels` package `wald_test` function found within `statsmodels.formula.api` module for evaluating whether linear regression intercept and slopes are homogeneous across populations.

As example, we can do homogeneity Wald test from unrestricted multiple linear regression of house prices explained by its lot size, number of bedrooms and air conditioning as dummy independent variable using data included within `AER` R package `HousePrices` object [1].

First, we import `statsmodels` package for data downloading, multiple linear regression fitting and Wald test [2].

``````In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
``````

Second, we create `houseprices` data object using `get_rdataset` function and display first five rows, first three columns and tenth column of data using `print` function and `head` data frame method to view its structure.

``````In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
``````
``````Out [2]:
price  lotsize  bedrooms aircon
0  42000.0     5850         3     no
1  38500.0     4000         2     no
2  49500.0     3060         3     no
3  60500.0     6650         3     no
4  61000.0     6360         2     no
``````

Third, as example again, we fit unrestricted multiple linear regression with `ols` function using variables within `houseprices` data object, store results within `mlr` object and print `mlr` object summary results using its `summary` method. Within `ols` function, parameter `formula="price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon"` fits unrestricted model where house price is explained by its lot size, number of bedrooms and air conditioning as dummy independent variable. Notice that `ols` function parameter `formula` can also be `formula="price ~ lotsize*aircon + bedrooms*aircon"` because it automatically includes `lotsize`, `bedrooms`, `aircon` individual independent variables and their `lotsize*aircon`, `bedrooms*aircon` products within model equation. Also, notice that `ols` function automatically converts `aircon` variable `yes` category into `1` numeric value and `no` category into `0` numeric value. Additionally, notice that `aircon` dummy independent variable was only included as educational example which can be modified according to your needs.

``````In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms + aircon + lotsize*aircon + bedrooms*aircon", data=houseprices).fit()
print(mlr.summary())
``````
``````Out [3]:
OLS Regression Results
==============================================================================
Dep. Variable:                  price   R-squared:                       0.478
Method:                 Least Squares   F-statistic:                     99.09
Date:                Sat, 23 Oct 2021   Prob (F-statistic):           5.14e-74
Time:                        13:33:49   Log-Likelihood:                -6161.6
No. Observations:                 546   AIC:                         1.234e+04
Df Residuals:                     540   BIC:                         1.236e+04
Df Model:                           5
Covariance Type:            nonrobust
==========================================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
Intercept               1.536e+04   4263.209      3.603      0.000    6985.598    2.37e+04
aircon[T.yes]          -1.423e+04   9434.410     -1.509      0.132   -3.28e+04    4297.895
lotsize                    4.6206      0.466      9.915      0.000       3.705       5.536
lotsize:aircon[T.yes]      2.4380      0.882      2.763      0.006       0.705       4.171
bedrooms                7709.3160   1326.284      5.813      0.000    5104.008    1.03e+04
bedrooms:aircon[T.yes]  6125.1574   2661.132      2.302      0.022     897.718    1.14e+04
==============================================================================
Omnibus:                       81.680   Durbin-Watson:                   1.431
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              182.492
Skew:                           0.807   Prob(JB):                     2.36e-40
Kurtosis:                       5.328   Cond. No.                     7.30e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.3e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
``````

Fourth, we do Wald test using `wald_test` function, store results within `waldtest` object and print its results. Within `wald_test` function, parameters `r_matrix="aircon[T.yes] = lotsize:aircon[T.yes] = bedrooms:aircon[T.yes] = 0"` includes air conditioning dummy independent variable and air conditioning dummy independent variable products with lot size and bedrooms independent variables coefficients joint null hypothesis string and `use_f=True` does F-test. Notice that unrestricted `mlr` model results and `wald_test` function parameter `use_f=True` were only included as educational examples which can be modified according to your needs.

``````In [4]:
waldtest = mlr.wald_test(r_matrix="aircon[T.yes] = lotsize:aircon[T.yes] = bedrooms:aircon[T.yes] = 0", use_f=True)
print(waldtest)
``````
``````Out [4]:
<F test: F=array([[37.35040103]]), p=6.030790422224445e-22, df_denom=540, df_num=3>
``````

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

+