Skip to content

Linear Regression: Coefficients Analysis

Last Update: February 21, 2022

Linear Regression: Coefficients Analysis is used to analyze linear relationship between one dependent variable y and two or more independent variables x_{1}...x_{p}. Variable y is also known as target or response feature and variables x_{1}...x_{p} are also known as predictor features. It is also used to evaluate whether adding independent variables individually improved linear regression model.

As example, we can fit a three-variable multiple linear regression with formula \hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1). Regression fitted values \hat{y}_{i} are the estimated y_{i} values. Estimated constant coefficient \hat{\beta}_{0} is the \hat{y} value when x_{1}=0 and x_{2}=0. Estimated partial regression coefficient \hat{\beta}_{1} is the estimated change in y when x_{1} changes in one unit while holding x_{2} constant. Similarly, estimated partial regression coefficient \hat{\beta}_{2} is the estimated change in y when x_{2} changes in one unit while holding x_{1} constant.

Then, we can estimate coefficient k standard error with formula se_{\hat{\beta}_{k}}=\sqrt{ms_{res}diag((x'x)^{-1})_{k}}\;(2) as squared root of residual mean squared error ms_{res} multiplied by k element of matrix (x'x)^{-1} principal diagonal.

Residual mean squared error ms_{res} with formula ms_{res}=\frac{ss_{res}}{df_{res}}\;(3) is estimated as residual sum of squares ss_{res} divided by residual degrees of freedom df_{res}. Residual sum of squares ss_{res} with formula ss_{res}=\sum_{i=1}^{n}\hat{e}_{i}^{2}\;(4) is estimated as the sum of squared regression residuals \hat{e}_{i}. Regression residuals \hat{e}_{i} with formula \hat{e}_{i}=y_{i}-\hat{y}_{i}\;(5) are estimated as differences between actual y_{i} and fitted \hat{y}_{i} values. Residual degrees of freedom df_{res} with formula df_{res}=n-p-1\;(6) are the number of observations n minus number of independent variables p minus constant term.

Matrix (x'x)^{-1} with dimension (p+1) x (p+1) is the inverse of the matrix product between the transpose of matrix x and matrix x. Matrix x with dimension n x (p+1) is the independent variables matrix including constant term column of ones.

Next, we can estimate coefficient k t-statistic with formula t_{\hat{\beta}_{k}}=\frac{\hat{\beta}_{k}}{se_{\hat{\beta}_{k}}}\;(7) and do t-test with individual null hypothesis that independent variable x_{k} coefficient is equal to zero with formula H_{0}:\hat{\beta}_{k}=0\;(8). If individual null hypothesis is rejected, then adding independent variable x_{k} improved linear regression model.

Below, we find an example of coefficients analysis from multiple linear regression of house price explained by its lot size and number of bedrooms [1].

Table 1. Microsoft Excel® coefficients analysis from multiple linear regression of house price explained by its lot size and number of bedrooms.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

My online courses are closed for enrollment.
+