Skip to content

Stationarity: Augmented Dickey-Fuller Test in Python

Last Update: May 13, 2022

Stationarity: Augmented Dickey-Fuller Test in Python can be done using statsmodels package adfuller function found within its statsmodels.tsa.stattools module for evaluating whether time series mean does not change over time. Main parameters within adfuller function are x with time series data, regression with string on whether to include constant, deterministic linear or quadratic trend variables within test regression and maxlag with maximum lag order to include within test.

As example, we can do training range augmented Dickey-Fuller test using data included within datasets R package AirPassengers object [1].

First, we import packages pandas for data frames, statsmodels for data downloading and augmented Dickey-Fuller test and matplotlib for training range chart [2].

In [1]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.stattools as st
import matplotlib.pyplot as plt

Second, we create mdata model data object using get_rdataset function, convert mdata object into a data frame using DataFrame function and display first five months of data using print function and head data frame method to view time series structure.

In [2]:
mdata = sm.datasets.get_rdataset(dataname="AirPassengers",
                                     package="datasets",
                                     cache=True).data
mdata = pd.DataFrame(data=mdata["value"]).set_index(
    pd.date_range(start="1949", end="1961", freq="M"))
print(mdata.head())
Out [2]:
            value
1949-01-31    112
1949-02-28    118
1949-03-31    132
1949-04-30    129
1949-05-31    121

Third, we delimit training range for model fitting as first ten years of data and store outcome within tdata object. Then, we delimit testing range for model forecasting as last two years of data and store outcome within fdata object. Notice that training and testing ranges delimiting was only included as an educational example which can be modified according to your needs.

In [3]:
tdata = mdata[:"1958-12-31"]
fdata = mdata["1959-01-01":]

Fourth, we view training range data with plot, ylabel and xlabel functions. Within plot function, training range data object is included. Within ylabel and xlabel functions, vertical axis label and horizontal axis label strings are included.

In [4]:
plt.plot(tdata)
plt.ylabel("Air Passengers")
plt.xlabel("Year")
plt.show()
Out [4]:
Figure 1. Training range data.

Fifth, we do training range data augmented Dickey-Fuller test with adfuller function, store results within adf object and print its adf test statistic and adf_pvalue MacKinnon approximated p-value results. Within adfuller function, parameters x=tdata includes training range data object, regression="ct" contains string to include constant and deterministic linear trend variable within test regression and maxlag=12 includes twelve lags of training range values differences to calculate test statistic. Notice that we have to test whether constant, deterministic linear trend variable and which training range values differences number of lags are needed within test regression.

In [5]:
adf = st.adfuller(x=tdata, regression="ct", maxlag=12)
print("adf:", adf[0], "adf_pvalue:", adf[1])
Out [5]:
adf: -1.844903301211624 adf_pvalue: 0.6827144827033781

References

[1] Data Description: Monthly international airline passenger numbers in thousands from 1949 to 1960.

Original Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). “Time Series Analysis, Forecasting and Control”. Third Edition. Holden-Day. Series G.

Source: datasets R Package AirPassengers Object. R Core Team (2021). “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria.

[2] pandas Python package: Wes McKinney. (2010). Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56.

statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python”. Proceedings of the 9th Python in Science Conference.

matplotlib Python package: John D. Hunter. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9, 90-95.

My online courses are closed for enrollment.
+