Skip to content

Normality in Error Term: Q-Q Plot and Jarque-Bera Test

Last Update: February 21, 2022

Normality in Error Term is when linear regression errors are normally distributed. This can be visually tested through Q-Q plot or quantile-quantile plot which evaluates whether points comparing linear regression residuals sample quantiles and normal distribution theoretical quantiles are within points regression line fit. If points are outside their regression line fit, then model errors are assumed non normally distributed. This can also be tested through Jarque-Bera test [1] which evaluates whether model residuals skewness and excess kurtosis are equal to zero. If model residuals skewness and excess kurtosis are different to zero, then model errors are assumed non normally distributed.

As example, we can fit a three-variable multiple linear regression with formula \hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1). Then, we can obtain regression residuals \hat{e}_{i} with formula \hat{e}_{i}=y_{i}-\hat{y}_{i}\;(2) which are the estimated differences between actual y_{i} and fitted \hat{y}_{i} values.

Next, we can do Q-Q plot with points comparing residuals in ascending order on vertical axis as sample quantiles and inverse of standard normal cumulative distribution function at ascending order ranks percentiles on horizontal axis as theoretical quantiles. After that, we can do points regression line fit and visually test whether points are outside of it. If points are outside their regression line fit, then regression (1) errors are assumed non normally distributed. Notice that Q-Q plot reference line can be points regression line fit, quantiles regression line fit or 45-degree line.

Below, we find example of residuals normal Q-Q plot from multiple linear regression of house price explained by its lot size and number of bedrooms [2].

Figure 1. Microsoft Excel® residuals normal Q-Q plot from multiple linear regression of house price explained by its lot size and number of bedrooms.

Later, we can do Jarque-Bera test with test statistic formula jb=\frac{n}{6}(s^2+\frac{1}{4}(k-3)^2)\;(3) where n is residuals number of observations, s is residuals sample skewness, (k-3) is residuals sample excess kurtosis and chi-square test with joint null hypothesis that residuals skewness and excess kurtosis are equal to zero with formula H_{0}:s=(k-3)=0\;(4). If joint null hypothesis is rejected, then regression (1) errors are assumed non normally distributed.

Below, we find example of residuals Jarque-Bera test from multiple linear regression of house price explained by its lot size and number of bedrooms [2].

Table 1. Microsoft Excel® residuals Jarque-Bera test from multiple linear regression of house price explained by its lot size and number of bedrooms.

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression Courses.

References

[1] Jarque, Carlos M.; Bera, Anil K. (1987). “A test for normality of observations and regression residuals”. International Statistical Review. 55 (2): 163–172.

[2] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

My online courses are closed for enrollment.
+