Last Update: February 21, 2022
Linear Regression: Residual Standard Error in Python can be estimated using statsmodels
package ols
function, mse_resid
property found within statsmodels.formula.api
module and numpy
package sqrt
function for evaluating linear regression goodness of fit. Main parameters within ols
function are formula
with “y ~ x1 + … + xp”
model description string and data
with data frame object including model variables. Main parameter within sqrt
function is x
with value for square root calculation.
As example, we can estimate residual standard error from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
R package HousePrices
object [1].
First, we import packages numpy
for square root calculation and statsmodels
for data downloading, model fitting [2].
In [1]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
Second, we create houseprices
data object using get_rdataset
function and display first five rows and three columns of data using print
function and head
data frame method to view its structure.
In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
print(houseprices.iloc[:, 0:3].head())
Out [2]:
price lotsize bedrooms
0 42000.0 5850 3
1 38500.0 4000 2
2 49500.0 3060 3
3 60500.0 6650 3
4 61000.0 6360 2
Third, we fit model with ols
function using variables within houseprices
data object and store outcome within mlr
object. Within ols
function, parameter formula = “price ~ lotsize + bedrooms”
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
Fourth, we can print mlr
model estimated residual standard error using sqrt
function and its mse_resid
property.
In [4]:
print(np.sqrt(mlr.mse_resid))
Out [4]:
21229.04501315886
Fifth, we can also print mlr
model estimated residual standard error using sqrt
function and its resid
, df_resid
properties.
In [5]:
print(np.sqrt(sum(mlr.resid ** 2) / mlr.df_resid))
Out [5]:
21229.045013158862
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in Python Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] numpy Python package: Travis E. Oliphant, et al. (2020). Array programming with NumPy. Nature, 585, 357–362.
statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.