Skip to content

Simple Linear Regression in Python

Last Update: February 21, 2022

Simple linear regression in Python can be fitted using statsmodels package ols function found within statsmodels.formula.api module. Main parameters within ols function are formula with “y ~ x” model description string and data with data frame object including model variables. Therefore, ols(formula = “y ~ x”, data = model_data).fit() code line fits model \hat{y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{i} using variables included within model_data object.

As example, we can fit simple linear regression of house price explained by its lot size using data included within AER R package HousePrices object [1].

First, we import packages statsmodels for data downloading, model fitting and seaborn for charting [2].

In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns

Second, we create houseprices data object using get_rdataset function and display first five rows and two columns of data using print function and head data frame method to view its structure.

In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:2].head())
Out[2]:
     price  lotsize
0  42000.0     5850
1  38500.0     4000
2  49500.0     3060
3  60500.0     6650
4  61000.0     6360

Third, we draw scatter chart with regression line colored in red which doesn’t display its confidence interval.

In [3]:
sns.regplot(x="lotsize", y="price", data=houseprices, ci=None, line_kws={"color": "red"})
Out [3]:
Figure 1. Simple linear regression scatter chart of house price explained by its lot size.

Fourth, we fit model with ols function using variables within houseprices data object, store outcome within slr object and print its params parameter to observe coefficients estimates. Within ols function, parameter formula = “price ~ lotsize” fits model \hat{price} = \hat{\beta}_{0} + \hat{\beta}_{1} lotsize where house price is explained by its lot size.

In [4]:
slr = smf.ols(formula="price ~ lotsize", data=houseprices).fit()
print(slr.params)
Out [4]:
Intercept    34136.191565
lotsize          6.598768
dtype: float64

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

Waskom, M. L., (2021). “seaborn: statistical data visualization”. Journal of Open Source Software, 6(60), 3021.

My online courses are closed for enrollment.
+