Skip to content

ARIMA Models Identification: Correlograms in R

Last Update: June 21, 2022

ARIMA Models Identification: Correlograms in R can be done using ggplot2 package ggAcf and ggPacf functions for identifying ARIMA models autoregressive and moving average orders. Functions ggAcf and ggPacf are used to visualize autocorrelation and partial autocorrelation functions correlograms. Main parameters within ggAcf and ggPacf functions are x with time series data, ci with correlogram confidence interval statistical confidence level and lag.max with maximum lag to calculate correlogram.

As example, we can do training range univariate time series classical ARIMA(p,d,q) model autoregressive p and moving average q orders identification with autocorrelation and partial autocorrelation functions correlograms using data included within datasets package AirPassengers object [1]. Notice that we need to evaluate whether level or d order differentiated training range univariate time series is needed for ARIMA model integration d order.

First, we load packages forecast for time series characteristics, ggplot2 for training range and autocorrelation, partial autocorrelation functions correlograms charts [2].

In [1]:
library(forecast)
library(ggplot2)

Second, we create mdata model data object copied from datasets package AirPassengers object and print first six months of data using head function to view time series object structure.

In [2]:
mdata <- AirPassengers
head(mdata)
Out [2]:
     Jan Feb Mar Apr May Jun
1949 112 118 132 129 121 135

Third, we delimit training range for model fitting as first ten years of data with window function and store outcome within tdata object. Within window function, parameters x = mdata includes full range model data and end = c(1958, 12) includes training range end time. Then, we delimit testing range for model forecasting as last two years of data with window function and store outcome within fdata object. Within window function, parameters x = mdata includes full range model data and start = c(1959, 1) includes training range start time. Notice that training and testing ranges delimiting was only included as an educational example which can be modified according to your needs.

In [3]:
tdata <- window(x = mdata, end = c(1958, 12))
fdata <- window(x = mdata, start = c(1959, 1))

Fourth, we view training range data with autoplot and labs functions. Within autoplot function, parameter object = tdata includes training range data object. Within labs function, parameters y = "Air Passengers" includes vertical axis label and x = "Year" includes horizontal axis label.

In [4]:
autoplot(object = tdata) + labs(y = "Air Passengers", x = "Year")
Out [4]:
Figure 1. Training range data.

Fifth, we do training range time series autocorrelation function correlogram chart with ggAcf function. Within ggAcf function, parameters x = tdata includes training range data object, ci = 0.95 includes correlogram confidence interval ninety-five percent statistical confidence level and lag.max = 24 includes correlogram with twenty-four lags. Notice that ggAcf function parameters were only included as educational examples which can be modified according to your needs.

In [5]:
ggAcf(x = tdata, ci = 0.95, lag.max = 24)
Out [5]:
Figure 2. Training range time series autocorrelation function correlogram.

Sixth, we do training range time series partial autocorrelation function correlogram chart with ggPacf function. Within ggPacf function, parameters x = tdata includes training range data object, ci = 0.95 includes correlogram confidence interval ninety-five percent statistical confidence level and lag.max = 24 includes correlogram with twenty-four lags. Notice that ggPacf function parameters were only included as educational examples which can be modified according to your needs.

In [6]:
ggPacf(x = tdata, ci = 0.95, lag.max = 24)
Out [6]:
Figure 3. Training range time series partial autocorrelation function correlogram.

Seventh, we do training range time series ARIMA(p,d,q) model autoregressive p and moving average q orders identification.

  • If autocorrelation function ACF correlogram tails of gradually and partial autocorrelation function PACF correlogram drops after p statistically significant lags then we can observe the potential need of an autoregressive model AR(p) of order p.
  • Alternatively, if autocorrelation function ACF correlogram drops after q statistically significant lags and partial autocorrelation function PACF correlogram tails off gradually then we can observe the potential need of a moving average model MA(q) of order q.
  • Otherwise, if autocorrelation function ACF correlogram tails of gradually after q statistically significant lags and partial autocorrelation function PACF correlogram tails off gradually after p statistically significant lags then we can observe the potential need of an autoregressive moving average model ARMA(p,q) of orders p and q.

References

[1] Data Description: Monthly international airline passenger numbers in thousands from 1949 to 1960.

Original Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). “Time Series Analysis, Forecasting and Control”. Third Edition. Holden-Day. Series G.

Source: datasets R Package AirPassengers Object. R Core Team (2021). “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria.

[2] forecast R Package. Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2022). “forecast: Forecasting functions for time series and linear models”. R package version 8.16

ggplot2 R Package. Hadley Wickham (2016). “ggplot2: Elegant Graphics for Data Analysis”. Springer-Verlag New York

My online courses are closed for enrollment.
+