Skip to content

Multiple Linear Regression in Python

Last Update: February 21, 2022

Multiple linear regression in Python can be fitted using statsmodels package ols function found within statsmodels.formula.api module. Main parameters within ols function are formula with “y ~ x1 + … + xp” model description string and data with data frame object including model variables. Therefore, ols(formula = “y ~ x1 + x2”, data = model_data).fit() code line fits model \hat{y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{1i} + \hat{\beta}_{2} x_{2i} using variables included within model_data object.

As example, we can fit multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER R package HousePrices object [1].

First, we import package statsmodels for data downloading and model fitting [2].

In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

Second, we create houseprices data object using get_rdataset function and display first five rows and three columns of data using print function and head data frame method to view its structure.

In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
     price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2

Third, we fit model with ols function using variables within houseprices data object, store outcome within mlr object and print its params parameter to observe coefficients estimates. Within ols function, parameter formula = “price ~ lotsize + bedrooms” fits model \hat{price} = \hat{\beta}_{0} + \hat{\beta}_{1} lotsize + \hat{\beta}_{2} bedrooms where house price is explained by its lot size and number of bedrooms.

In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
Out [3]:
Intercept     5612.599731
lotsize          6.053022
bedrooms     10567.351501
dtype: float64


My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.


[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

My online courses are closed for enrollment.