# Linear Regression: Coefficients Analysis

Last Update: February 21, 2022

Linear Regression: Coefficients Analysis is used to analyze linear relationship between one dependent variable $y$ and two or more independent variables $x_{1}...x_{p}$. Variable $y$ is also known as target or response feature and variables $x_{1}...x_{p}$ are also known as predictor features. It is also used to evaluate whether adding independent variables individually improved linear regression model.

As example, we can fit a three-variable multiple linear regression with formula $\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1)$. Regression fitted values $\hat{y}_{i}$ are the estimated $y_{i}$ values. Estimated constant coefficient $\hat{\beta}_{0}$ is the $\hat{y}$ value when $x_{1}=0$ and $x_{2}=0$. Estimated partial regression coefficient $\hat{\beta}_{1}$ is the estimated change in $y$ when $x_{1}$ changes in one unit while holding $x_{2}$ constant. Similarly, estimated partial regression coefficient $\hat{\beta}_{2}$ is the estimated change in $y$ when $x_{2}$ changes in one unit while holding $x_{1}$ constant.

Then, we can estimate coefficient $k$ standard error with formula $se_{\hat{\beta}_{k}}=\sqrt{ms_{res}diag((x'x)^{-1})_{k}}\;(2)$ as squared root of residual mean squared error $ms_{res}$ multiplied by $k$ element of matrix $(x'x)^{-1}$ principal diagonal.

Residual mean squared error $ms_{res}$ with formula $ms_{res}=\frac{ss_{res}}{df_{res}}\;(3)$ is estimated as residual sum of squares $ss_{res}$ divided by residual degrees of freedom $df_{res}$. Residual sum of squares $ss_{res}$ with formula $ss_{res}=\sum_{i=1}^{n}\hat{e}_{i}^{2}\;(4)$ is estimated as the sum of squared regression residuals $\hat{e}_{i}$. Regression residuals $\hat{e}_{i}$ with formula $\hat{e}_{i}=y_{i}-\hat{y}_{i}\;(5)$ are estimated as differences between actual $y_{i}$ and fitted $\hat{y}_{i}$ values. Residual degrees of freedom $df_{res}$ with formula $df_{res}=n-p-1\;(6)$ are the number of observations $n$ minus number of independent variables $p$ minus constant term.

Matrix $(x'x)^{-1}$ with dimension $(p+1)$ x $(p+1)$ is the inverse of the matrix product between the transpose of matrix $x$ and matrix $x$. Matrix $x$ with dimension $n$ x $(p+1)$ is the independent variables matrix including constant term column of ones.

Next, we can estimate coefficient $k$ t-statistic with formula $t_{\hat{\beta}_{k}}=\frac{\hat{\beta}_{k}}{se_{\hat{\beta}_{k}}}\;(7)$ and do t-test with individual null hypothesis that independent variable $x_{k}$ coefficient is equal to zero with formula $H_{0}:\hat{\beta}_{k}=0\;(8)$. If individual null hypothesis is rejected, then adding independent variable $x_{k}$ improved linear regression model.

Below, we find an example of coefficients analysis from multiple linear regression of house price explained by its lot size and number of bedrooms [1].

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

+