Multiple Linear Regression – Data Science Concepts

Last Update: February 21, 2022

Multiple linear regression is used to model linear relationship between one dependent or explained variable $y$ and two or more independent or explanatory variables $x_{1},...,x_{p}$ . Variable $y$ is also known as target or response feature and variables $x_{1},...,x_{p}$ are also known as predictor features.

As example, we can fit a three-variable multiple linear regression model with formula $\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1)$ . Notice that we are using ˄ or hat character in formula notation because they are estimates. Regression fitted values $\hat{y}_{i}$ are the estimated $y_{i}$ values. Estimated constant coefficient $\hat{\beta}_{0}$ is the $\hat{y}$ value when $x_{1}=0$ and $x_{2}=0$ . Estimated partial regression coefficient $\hat{\beta}_{1}$ is the estimated change in $y$ when $x_{1}$ changes in one unit while holding $x_{2}$ constant. Similarly, estimated partial regression coefficient $\hat{\beta}_{2}$ is the estimated change in $y$ when $x_{2}$ changes in one unit while holding $x_{1}$ constant.

Model fitting can be done using ordinary least squares method with formula $min\sum_{i=1}^{n}\hat{e}_{i}^{2}\;(2)$ . This method minimizes the sum of squared regression residuals $\hat{e}_{i}$ . Regression residuals $\hat{e}_{i}$ with formula $\hat{e}_{i}=y_{i}-\hat{y}_{i}\;(3)$ are the estimated differences between actual $y_{i}$ and fitted $\hat{y}_{i}$ values.

Below, we find an example of estimated coefficients from multiple linear regression of house price explained by its lot size and number of bedrooms [1].

Table 1. Microsoft Excel® estimated coefficients from multiple linear regression of house price explained by its lot size and number of bedrooms.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.