Last Update: February 21, 2022
Multiple linear regression in Python can be fitted using statsmodels
package ols
function found within statsmodels.formula.api
module. Main parameters within ols
function are formula
with “y ~ x1 + … + xp”
model description string and data
with data frame object including model variables. Therefore, ols(formula = “y ~ x1 + x2”, data = model_data).fit()
code line fits model using variables included within model_data
object.
As example, we can fit multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
R package HousePrices
object [1].
First, we import package statsmodels
for data downloading and model fitting [2].
In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
Second, we create houseprices
data object using get_rdataset
function and display first five rows and three columns of data using print
function and head
data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
price lotsize bedrooms
0 42000.0 5850 3
1 38500.0 4000 2
2 49500.0 3060 3
3 60500.0 6650 3
4 61000.0 6360 2
Third, we fit model with ols
function using variables within houseprices
data object, store outcome within mlr
object and print its params
parameter to observe coefficients estimates. Within ols
function, parameter formula = “price ~ lotsize + bedrooms”
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
print(mlr.params)
Out [3]:
Intercept 5612.599731
lotsize 6.053022
bedrooms 10567.351501
dtype: float64
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.