Last Update: February 21, 2022
Linear Regression: Residual Standard Error in R can be estimated using stats
package lm
, summary.lm
functions and sigma
value for evaluating linear regression goodness of fit. Main parameters within lm
function are formula
with y ~ x1 + … + xp
model description and data
with data.frame
object including model variables. Main parameter within summary.lm
function is object
with previously fitted lm
model.
As example, we can estimate residual standard error from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
package HousePrices
object [1].
First, we load package AER
for data [2].
In [1]:
library(AER)
Second, we create HousePrices
data object from AER
package using data
function and print first six rows and three columns of data using head
function to view data.frame
structure.
In [2]:
data(HousePrices)
head(HousePrices[,1:3])
Out [2]:
price lotsize bedrooms
1 42000 5850 3
2 38500 4000 2
3 49500 3060 3
4 60500 6650 3
5 61000 6360 2
6 66000 4160 3
Third, we fit model with lm
function using variables within HousePrices
data object and store outcome within mlr
object. Within lm
function, parameter formula = price ~ lotsize + bedrooms
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Fourth, we can print mlr
model summary results which include estimated residual standard error using summary.lm
function.
In [4]:
summary.lm(mlr)
Out [4]:
Call:
lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Residuals:
Min 1Q Median 3Q Max
-65665 -12498 -2075 8970 97205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.613e+03 4.103e+03 1.368 0.172
lotsize 6.053e+00 4.243e-01 14.265 < 2e-16 ***
bedrooms 1.057e+04 1.248e+03 8.470 2.31e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21230 on 543 degrees of freedom
Multiple R-squared: 0.3703, Adjusted R-squared: 0.3679
F-statistic: 159.6 on 2 and 543 DF, p-value: < 2.2e-16
Fifth, we can also store model summary results within smlr
object using summary.lm
function and print its sigma
value with estimated residual standard error.
In [5]:
smlr <- summary.lm(mlr)
smlr$sigma
Out [5]:
[1] 21229.05
Sixth, we can additionally print mlr
model estimated residual standard error using sqrt
, sum
functions and its residuals
, df.residual
values.
In [6]:
sqrt(sum(mlr$residuals^2)/mlr$df.residual)
Out [6]:
[1] 21229.05
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.