Last Update: February 21, 2022
Multiple linear regression in Python can be fitted using statsmodels package ols function found within statsmodels.formula.api module. Main parameters within ols function are formula with “y ~ x1 + … + xp” model description string and data with data frame object including model variables. Therefore, ols(formula = “y ~ x1 + x2”, data = model_data).fit() code line fits model using variables included within
model_data object.
As example, we can fit multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER R package HousePrices object [1].
First, we import package statsmodels for data downloading and model fitting [2].
In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
Second, we create houseprices data object using get_rdataset function and display first five rows and three columns of data using print function and head data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
price lotsize bedrooms
0 42000.0 5850 3
1 38500.0 4000 2
2 49500.0 3060 3
3 60500.0 6650 3
4 61000.0 6360 2
Third, we fit model with ols function using variables within houseprices data object, store outcome within mlr object and print its params parameter to observe coefficients estimates. Within ols function, parameter formula = “price ~ lotsize + bedrooms” fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
print(mlr.params)
Out [3]:
Intercept 5612.599731
lotsize 6.053022
bedrooms 10567.351501
dtype: float64
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.