Last Update: May 13, 2022
Stationarity: Augmented Dickey-Fuller Test in Python can be done using
adfuller function found within its
statsmodels.tsa.stattools module for evaluating whether time series mean does not change over time. Main parameters within
adfuller function are
x with time series data,
regression with string on whether to include constant, deterministic linear or quadratic trend variables within test regression and
maxlag with maximum lag order to include within test.
As example, we can do training range augmented Dickey-Fuller test using data included within
datasets R package
AirPassengers object .
First, we import packages
pandas for data frames,
statsmodels for data downloading and augmented Dickey-Fuller test and
matplotlib for training range chart .
In : import pandas as pd import statsmodels.api as sm import statsmodels.tsa.stattools as st import matplotlib.pyplot as plt
Second, we create
mdata model data object using
get_rdataset function, convert
mdata object into a data frame using
DataFrame function and display first five months of data using
head data frame method to view time series structure.
In : mdata = sm.datasets.get_rdataset(dataname="AirPassengers", package="datasets", cache=True).data mdata = pd.DataFrame(data=mdata["value"]).set_index( pd.date_range(start="1949", end="1961", freq="M")) print(mdata.head())
Out : value 1949-01-31 112 1949-02-28 118 1949-03-31 132 1949-04-30 129 1949-05-31 121
Third, we delimit training range for model fitting as first ten years of data and store outcome within
tdata object. Then, we delimit testing range for model forecasting as last two years of data and store outcome within
fdata object. Notice that training and testing ranges delimiting was only included as an educational example which can be modified according to your needs.
In : tdata = mdata[:"1958-12-31"] fdata = mdata["1959-01-01":]
Fourth, we view training range data with
xlabel functions. Within
plot function, training range data object is included. Within
xlabel functions, vertical axis label and horizontal axis label strings are included.
In : plt.plot(tdata) plt.ylabel("Air Passengers") plt.xlabel("Year") plt.show()
Fifth, we do training range data augmented Dickey-Fuller test with
adfuller function, store results within
adf object and print its
adf test statistic and
adf_pvalue MacKinnon approximated p-value results. Within
adfuller function, parameters
x=tdata includes training range data object,
regression="ct" contains string to include constant and deterministic linear trend variable within test regression and
maxlag=12 includes twelve lags of training range values differences to calculate test statistic. Notice that we have to test whether constant, deterministic linear trend variable and which training range values differences number of lags are needed within test regression.
In : adf = st.adfuller(x=tdata, regression="ct", maxlag=12) print("adf:", adf, "adf_pvalue:", adf)
Out : adf: -1.844903301211624 adf_pvalue: 0.6827144827033781
 Data Description: Monthly international airline passenger numbers in thousands from 1949 to 1960.
Original Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). “Time Series Analysis, Forecasting and Control”. Third Edition. Holden-Day. Series G.
Source: datasets R Package AirPassengers Object. R Core Team (2021). “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria.
 pandas Python package: Wes McKinney. (2010). Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56.
statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python”. Proceedings of the 9th Python in Science Conference.
matplotlib Python package: John D. Hunter. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9, 90-95.