Last Update: May 13, 2022
Stationarity: Augmented Dickey-Fuller Test in Python can be done using statsmodels
package adfuller
function found within its statsmodels.tsa.stattools
module for evaluating whether time series mean does not change over time. Main parameters within adfuller
function are x
with time series data, regression
with string on whether to include constant, deterministic linear or quadratic trend variables within test regression and maxlag
with maximum lag order to include within test.
As example, we can do training range augmented Dickey-Fuller test using data included within datasets
R package AirPassengers
object [1].
First, we import packages pandas
for data frames, statsmodels
for data downloading and augmented Dickey-Fuller test and matplotlib
for training range chart [2].
In [1]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.stattools as st
import matplotlib.pyplot as plt
Second, we create mdata
model data object using get_rdataset
function, convert mdata
object into a data frame using DataFrame
function and display first five months of data using print
function and head
data frame method to view time series structure.
In [2]:
mdata = sm.datasets.get_rdataset(dataname="AirPassengers",
package="datasets",
cache=True).data
mdata = pd.DataFrame(data=mdata["value"]).set_index(
pd.date_range(start="1949", end="1961", freq="M"))
print(mdata.head())
Out [2]:
value
1949-01-31 112
1949-02-28 118
1949-03-31 132
1949-04-30 129
1949-05-31 121
Third, we delimit training range for model fitting as first ten years of data and store outcome within tdata
object. Then, we delimit testing range for model forecasting as last two years of data and store outcome within fdata
object. Notice that training and testing ranges delimiting was only included as an educational example which can be modified according to your needs.
In [3]:
tdata = mdata[:"1958-12-31"]
fdata = mdata["1959-01-01":]
Fourth, we view training range data with plot
, ylabel
and xlabel
functions. Within plot
function, training range data object is included. Within ylabel
and xlabel
functions, vertical axis label and horizontal axis label strings are included.
In [4]:
plt.plot(tdata)
plt.ylabel("Air Passengers")
plt.xlabel("Year")
plt.show()
Out [4]:
![](https://www.datascienceconcepts.com/wp-content/uploads/2022/05/Stationarity.-Training-Range-Time-Series-Python.png)
Figure 1. Training range data.
Fifth, we do training range data augmented Dickey-Fuller test with adfuller
function, store results within adf
object and print its adf
test statistic and adf_pvalue
MacKinnon approximated p-value results. Within adfuller
function, parameters x=tdata
includes training range data object, regression="ct"
contains string to include constant and deterministic linear trend variable within test regression and maxlag=12
includes twelve lags of training range values differences to calculate test statistic. Notice that we have to test whether constant, deterministic linear trend variable and which training range values differences number of lags are needed within test regression.
In [5]:
adf = st.adfuller(x=tdata, regression="ct", maxlag=12)
print("adf:", adf[0], "adf_pvalue:", adf[1])
Out [5]:
adf: -1.844903301211624 adf_pvalue: 0.6827144827033781
References
[1] Data Description: Monthly international airline passenger numbers in thousands from 1949 to 1960.
Original Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). “Time Series Analysis, Forecasting and Control”. Third Edition. Holden-Day. Series G.
Source: datasets R Package AirPassengers Object. R Core Team (2021). “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria.
[2] pandas Python package: Wes McKinney. (2010). Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56.
statsmodels Python package: Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python”. Proceedings of the 9th Python in Science Conference.
matplotlib Python package: John D. Hunter. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9, 90-95.