# ARIMA Models Identification: Correlograms

Last Update: June 20, 2022

ARIMA Models Identification: Correlograms are used to identify ARIMA models autoregressive and moving average orders.

As example, we can delimit univariate time series $y_{t}$ into training range $y_{t(a)}$ for model fitting and testing range $y_{t(b)}$ for model forecasting.

Then, we can do training range univariate time series $ARIMA(p,d,q)$ model autoregressive $p$ and moving average $q$ orders identification by estimating sample autocorrelation and partial autocorrelation functions correlograms. Notice that we need to evaluate whether level or $d$ order differentiated training range univariate time series is needed for ARIMA model integration $d$ order.

Next, we estimate training range univariate time series sample autocorrelation function correlogram with formula $\hat{\rho}_{h}=\frac{\sigma\left&space;(&space;y_{t(a)},y_{t(a)-h}&space;\right&space;)}{\sigma^{2}_{y(a)}}\:(1)$. Training range univariate time series estimated lag $h$ sample autocorrelation function $\hat{\rho}_{h}$ is the estimated sample covariance between univariate time series $y_{t(a)}$ and its lag $h$ univariate time series $y_{t(a)-h}$ divided by univariate time series estimated sample variance $\sigma^{2}_{y(a)}$ . Training range autocorrelation function is the linear dependence between current $y_{t(a)}$ and lagged $y_{t(a)-h}$ univariate time series data.

After that, we estimate training range sample autocorrelation function correlogram confidence intervals with formula $ci_{h}=\pm&space;n^{-1}\left&space;(&space;1-\frac{\alpha}{2}&space;\right&space;)\sqrt{\frac{1+2\sum_{i=1}^{h-1}\hat{\rho}_{i}^{2}}{n(a)}}\;(2)$. Training range sample autocorrelation function confidence intervals $ci_{h}$ are the inverse of the standard normal cumulative distribution $n^{-1}$ with probability equal to one minus statistical significance level $\alpha$ divided by two and this result multiplied by the square root of one plus two times the sum of sample autocorrelation function square values $\hat{\rho}^{2}_{i}$ divided by number of observations $n(a)$. Notice that we need to evaluate whether Bartlett formula is needed for sample autocorrelation function correlogram confidence intervals estimation.

Later, we estimate training range univariate time series sample partial autocorrelation function correlogram with formula $\hat{\phi&space;}_{h}=\hat{\beta}_{h}\;(3)$. Training range univariate time series lag $h$ sample partial autocorrelation function $\hat{\phi}_{h}$ can be estimated through lag $h$ linear regression estimated coefficient $\hat{\beta}_{h}$ with formula $\hat{y}_{t(a)}=\hat{\beta}_{0}+\sum_{i=1}^{h}\hat{\beta}_{i}y_{t(a)-i}\;(4)$. Training range partial autocorrelation function is the linear dependence between current $y_{t(a)}$ and lagged $y_{t(a)-h}$ univariate time series data after removing any linear dependence on $y_{1(a)},...,y_{t-h+1(a)}$. Notice that we can also estimate sample partial autocorrelation function using Yule-Walker, Levison-Durbin or adjusted linear regression methods.

Then, we estimate training range sample partial autocorrelation function correlogram confidence intervals with formula $ci_{h}=\pm&space;n^{-1}\left&space;(&space;1-\frac{\alpha}{2}&space;\right&space;)\left&space;(&space;\frac{1}{\sqrt{n(a)}}&space;\right&space;)\;(5)$. Training range sample partial autocorrelation function confidence intervals $ci_{h}$ are the inverse of the standard normal cumulative distribution $n^{-1}$ with probability equal to one minus statistical significance level $\alpha$ divided by two and this result multiplied by one divided by the square root of number of observations $n(a)$.

Next, we can do training range univariate time series $ARIMA(p,d,q)$ model autoregressive $p$ and moving average $q$ orders identification.

• If sample autocorrelation function correlogram $\hat{\rho}_{h}$ tails of gradually and sample partial autocorrelation function correlogram $\hat{\phi}_{h}$ drops after $p$ statistically significant lags then we can observe the potential need of an autoregressive model $AR(p)$ of order $p$.
• Alternatively, if sample autocorrelation function correlogram $\hat{\rho}_{h}$ drops after $q$ statistically significant lags and sample partial autocorrelation function correlogram $\hat{\phi}_{h}$ tails off gradually then we can observe the potential need of a moving average model $MA(q)$ of order $q$.
• Otherwise, if sample autocorrelation function correlogram $\hat{\rho}_{h}$ tails of gradually after $q$ statistically significant lags and sample partial autocorrelation function correlogram $\hat{\phi}_{h}$ tails off gradually after $p$ statistically significant lags then we can observe the potential need of an autoregressive moving average model $ARMA(p,q)$ of orders $p$ and $q$.

Below, we find example of training range univariate time series $ARIMA(p,d,q)$ model autoregressive $p$ and moving average $q$ orders identification correlograms using airline passengers data [1]. Training range as first ten years and testing range as last two years of data. Correlograms confidence intervals with $\alpha=5%$ statistical significance level. Notice that correlograms confidence intervals statistical significance level was only included as an educational example which can be modified according to your needs.

References

[1] Data Description: Monthly international airline passenger numbers in thousands from 1949 to 1960.

Original Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). “Time Series Analysis, Forecasting and Control”. Third Edition. Holden-Day. Series G.

Source: datasets R Package AirPassengers Object. R Core Team (2021). “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria.

+