Normality in Error Term: Q-Q Plot and Jarque-Bera Test in R

Normality in Error Term: Q-Q Plot in R can be done using `ggplot2` package `ggplot`, `stat_qq` and `stat_qq_line` functions for evaluating whether points comparing linear regression residuals sample quantiles and normal distribution theoretical quantiles are within quantiles regression line fit. Main parameters within `ggplot` function are `data` with model residuals data frame and `aes` with variables aesthetic mappings. Main parameter within `stat_qq` function is `distribution` with probability distribution quantile function to use. Main parameters within `stat_qq_line` function are `distribution` with probability distribution quantile function to use and `line.p` with percentiles vector for fitting quantiles regression line.

Normality in Error Term: Jarque-Bera Test can be done using `tseries` package `jarque.bera.test` function for evaluating whether linear regression residuals skewness and excess kurtosis are equal to zero. Main parameter within `jarque.bera.test` function is `x` with model residuals numeric vector.

As example, we can do residuals Q-Q plot and Jarque-Bera test from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within `AER` package `HousePrices` object [1].

First, we load packages `AER` for data, `ggplot2` for Q-Q plot and `tseries` for Jarque-Bera test [2].

``````In [1]:
library(AER)
library(ggplot2)
library(tseries)
``````

Second, we create `HousePrices` data object from `AER` package using `data` function and print first six rows and three columns of data using `head` function to view `data.frame` structure.

``````In [2]:
data(HousePrices)
``````
``````Out [2]:
price lotsize bedrooms
1 42000    5850        3
2 38500    4000        2
3 49500    3060        3
4 60500    6650        3
5 61000    6360        2
6 66000    4160        3
``````

Third, we fit multiple linear regression using `lm` function and store results within `mlr` object. Within `lm` function, parameter `formula = price ~ lotsize + bedrooms` fits model where house price is explained by its lot size and number of bedrooms.

``````In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
``````

Fourth, we get residuals from `mlr` multiple linear regression object and store them within `res` object.

``````In [4]:
res <- mlr\$residuals
``````

Fifth, we do normal Q-Q plot using `ggplot`, `stat_qq` and `stat_qq_line` functions. Within `ggplot` function, parameters `data = data.frame(res)` includes residuals as `data.frame` and `aes(sample = res)` includes residuals data variable `sample` aesthetics mapping required by `stat_qq` function. Within `stat_qq` function, parameter `distribution = qnorm` includes normal distribution quantile function. Within `stat_qq_line` function, parameters `distribution = qnorm` includes normal distribution quantile function and `line.p = c(0.25, 0.75)` includes 0.25 and 0.75 percentiles vector for fitting quantiles regression line.

``````In [5]:
ggplot(data = data.frame(res), aes(sample = res)) +
stat_qq(distribution = qnorm) +
stat_qq_line(distribution = qnorm, line.p = c(0.25, 0.75), color = "red") +
labs(title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles")
``````
``Out [5]:``

Sixth, we do Jarque-Bera test using `jarque.bera.test` function. Within `jarque.bera.test` function, parameter `x = res` includes residuals numeric vector.

``````In [6]:
jarque.bera.test(x = res)
``````
``````Out [6]:
Jarque Bera Test

data:  res
X-squared = 146.85, df = 2, p-value < 2.2e-16
``````

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

ggplot2 R Package: Hadley Wickham. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

tseries R Package: Adrian Trapletti and Kurt Hornik. (2020). tseries: Time Series Analysis and Computational Finance. R package version 0.10-48.

