Simple Linear Regression – Data Science Concepts

Last Update: February 21, 2022

Simple linear regression is used to model linear relationship between two variables $y$ and $x$ . Dependent variable $y$ is the explained one which is also known as target or response feature. Independent variable $x$ is the explanatory one which is also known as predictor feature.

When doing simple linear regression, we can start by drawing a scatter chart with variables $y$ and $x$ on the vertical and horizontal axis, respectively. Then, we can draw a line which describes linear relationship between variables $y$ and $x$ . This line represents model fitting with formula $\hat{y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{i} \; (1)$ . Notice that we are using ˄ or hat character in formula notation because they are estimates. Regression fitted values $\hat{y}_{i}$ are the estimated $y_{i}$ values. Estimated constant or intercept coefficient $\hat{\beta}_{0}$ is the $\hat{y}$ value when $x=0$ or the $\hat{y}$ value where line crosses vertical axis. Estimated slope coefficient $\hat{\beta}_{1}$ is the estimated change in $\hat{y}$ when $x$ changes in one unit.

Model fitting can be done using ordinary least squares method with formula $min \; \sum_{i=1}^{n} \hat{e}_{i}^{2} \; (2)$ . This method minimizes the sum of squared estimated regression residuals $\hat{e}_{i}$ . Estimated regression residuals $\hat{e}_{i}$ with formula $\hat{e}_{i} = y_{i} - \hat{y}_{i} \; (3)$ are the differences between actual $y_{i}$ and fitted $\hat{y}_{i}$ values.

Below, we find an example of scatter chart with simple linear regression of house price explained by its lot size [1].

Figure 1. Simple linear regression scatter chart of house price explained by its lot size.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R package HousePrices object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.