# Simple Linear Regression in Python

Last Update: February 21, 2022

Simple linear regression in Python can be fitted using statsmodels package ols function found within statsmodels.formula.api module. Main parameters within ols function are formula with “y ~ x” model description string and data with data frame object including model variables. Therefore, ols(formula = “y ~ x”, data = model_data).fit() code line fits model using variables included within model_data object.

As example, we can fit simple linear regression of house price explained by its lot size using data included within AER R package HousePrices object .

First, we import packages statsmodels for data downloading, model fitting and seaborn for charting .

In :
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns

Second, we create houseprices data object using get_rdataset function and display first five rows and two columns of data using print function and head data frame method to view its structure.

In :
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:2].head())
Out:
price  lotsize
0  42000.0     5850
1  38500.0     4000
2  49500.0     3060
3  60500.0     6650
4  61000.0     6360

Third, we draw scatter chart with regression line colored in red which doesn’t display its confidence interval.

In :
sns.regplot(x="lotsize", y="price", data=houseprices, ci=None, line_kws={"color": "red"})
Out :

Fourth, we fit model with ols function using variables within houseprices data object, store outcome within slr object and print its params parameter to observe coefficients estimates. Within ols function, parameter formula = “price ~ lotsize” fits model where house price is explained by its lot size.

In :
slr = smf.ols(formula="price ~ lotsize", data=houseprices).fit()
print(slr.params)
Out :
Intercept    34136.191565
lotsize          6.598768
dtype: float64

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

 Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

 Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

Waskom, M. L., (2021). “seaborn: statistical data visualization”. Journal of Open Source Software, 6(60), 3021.

+