Last Update: February 21, 2022
Simple linear regression in Python can be fitted using statsmodels
package ols
function found within statsmodels.formula.api
module. Main parameters within ols
function are formula
with “y ~ x”
model description string and data
with data frame object including model variables. Therefore, ols(formula = “y ~ x”, data = model_data).fit()
code line fits model using variables included within
model_data
object.
As example, we can fit simple linear regression of house price explained by its lot size using data included within AER
R package HousePrices
object [1].
First, we import packages statsmodels
for data downloading, model fitting and seaborn
for charting [2].
In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
Second, we create houseprices
data object using get_rdataset
function and display first five rows and two columns of data using print
function and head
data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:2].head())
Out[2]:
price lotsize
0 42000.0 5850
1 38500.0 4000
2 49500.0 3060
3 60500.0 6650
4 61000.0 6360
Third, we draw scatter chart with regression line colored in red which doesn’t display its confidence interval.
In [3]:
sns.regplot(x="lotsize", y="price", data=houseprices, ci=None, line_kws={"color": "red"})
Out [3]:

Figure 1. Simple linear regression scatter chart of house price explained by its lot size.
Fourth, we fit model with ols
function using variables within houseprices
data object, store outcome within slr
object and print its params
parameter to observe coefficients estimates. Within ols
function, parameter formula = “price ~ lotsize”
fits model where house price is explained by its lot size.
In [4]:
slr = smf.ols(formula="price ~ lotsize", data=houseprices).fit()
print(slr.params)
Out [4]:
Intercept 34136.191565
lotsize 6.598768
dtype: float64
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.
Waskom, M. L., (2021). “seaborn: statistical data visualization”. Journal of Open Source Software, 6(60), 3021.