# Linear Regression: Residual Standard Error in Python

Last Update: February 21, 2022

Linear Regression: Residual Standard Error in Python can be estimated using `statsmodels` package `ols` function, `mse_resid` property found within `statsmodels.formula.api` module and `numpy` package `sqrt` function for evaluating linear regression goodness of fit. Main parameters within `ols` function are `formula` with `“y ~ x1 + … + xp”` model description string and `data` with data frame object including model variables. Main parameter within `sqrt` function is `x` with value for square root calculation.

As example, we can estimate residual standard error from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within `AER` R package `HousePrices` object [1].

First, we import packages `numpy` for square root calculation and `statsmodels` for data downloading, model fitting [2].

``````In [1]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
``````

Second, we create `houseprices` data object using `get_rdataset` function and display first five rows and three columns of data using `print` function and `head` data frame method to view its structure.

``````In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
``````
``````Out [2]:
price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2
``````

Third, we fit model with `ols` function using variables within `houseprices` data object and store outcome within `mlr` object. Within `ols` function, parameter `formula = “price ~ lotsize + bedrooms”` fits model where house price is explained by its lot size and number of bedrooms.

``````In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
``````

Fourth, we can print `mlr` model estimated residual standard error using `sqrt` function and its `mse_resid` property.

``````In [4]:
print(np.sqrt(mlr.mse_resid))
``````
``````Out [4]:
21229.04501315886
``````

Fifth, we can also print `mlr` model estimated residual standard error using `sqrt` function and its `resid`, `df_resid` properties.

``````In [5]:
print(np.sqrt(sum(mlr.resid ** 2) / mlr.df_resid))
``````
``````Out [5]:
21229.045013158862
``````

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] numpy Python package: Travis E. Oliphant, et al. (2020). Array programming with NumPy. Nature, 585, 357–362.

statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

+