Last Update: February 21, 2022
Normality in Error Term: Q-Q Plot in R can be done using ggplot2
package ggplot
, stat_qq
and stat_qq_line
functions for evaluating whether points comparing linear regression residuals sample quantiles and normal distribution theoretical quantiles are within quantiles regression line fit. Main parameters within ggplot
function are data
with model residuals data frame and aes
with variables aesthetic mappings. Main parameter within stat_qq
function is distribution
with probability distribution quantile function to use. Main parameters within stat_qq_line
function are distribution
with probability distribution quantile function to use and line.p
with percentiles vector for fitting quantiles regression line.
Normality in Error Term: Jarque-Bera Test can be done using tseries
package jarque.bera.test
function for evaluating whether linear regression residuals skewness and excess kurtosis are equal to zero. Main parameter within jarque.bera.test
function is x
with model residuals numeric vector.
As example, we can do residuals Q-Q plot and Jarque-Bera test from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within AER
package HousePrices
object [1].
First, we load packages AER
for data, ggplot2
for Q-Q plot and tseries
for Jarque-Bera test [2].
In [1]:
library(AER)
library(ggplot2)
library(tseries)
Second, we create HousePrices
data object from AER
package using data
function and print first six rows and three columns of data using head
function to view data.frame
structure.
In [2]:
data(HousePrices)
head(HousePrices[, 1:3])
Out [2]:
price lotsize bedrooms
1 42000 5850 3
2 38500 4000 2
3 49500 3060 3
4 60500 6650 3
5 61000 6360 2
6 66000 4160 3
Third, we fit multiple linear regression using lm
function and store results within mlr
object. Within lm
function, parameter formula = price ~ lotsize + bedrooms
fits model where house price is explained by its lot size and number of bedrooms.
In [3]:
mlr <- lm(formula = price ~ lotsize + bedrooms, data = HousePrices)
Fourth, we get residuals from mlr
multiple linear regression object and store them within res
object.
In [4]:
res <- mlr$residuals
Fifth, we do normal Q-Q plot using ggplot
, stat_qq
and stat_qq_line
functions. Within ggplot
function, parameters data = data.frame(res)
includes residuals as data.frame
and aes(sample = res)
includes residuals data variable sample
aesthetics mapping required by stat_qq
function. Within stat_qq
function, parameter distribution = qnorm
includes normal distribution quantile function. Within stat_qq_line
function, parameters distribution = qnorm
includes normal distribution quantile function and line.p = c(0.25, 0.75)
includes 0.25 and 0.75 percentiles vector for fitting quantiles regression line.
In [5]:
ggplot(data = data.frame(res), aes(sample = res)) +
stat_qq(distribution = qnorm) +
stat_qq_line(distribution = qnorm, line.p = c(0.25, 0.75), color = "red") +
labs(title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles")
Out [5]:
Sixth, we do Jarque-Bera test using jarque.bera.test
function. Within jarque.bera.test
function, parameter x = res
includes residuals numeric vector.
In [6]:
jarque.bera.test(x = res)
Out [6]:
Jarque Bera Test
data: res
X-squared = 146.85, df = 2, p-value < 2.2e-16
Courses
My online courses are hosted at Teachable website.
For more details on this concept, you can view my Linear Regression in R Course.
References
[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
[2] AER R Package: Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.
ggplot2 R Package: Hadley Wickham. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
tseries R Package: Adrian Trapletti and Kurt Hornik. (2020). tseries: Time Series Analysis and Computational Finance. R package version 0.10-48.