# Time Series Stationarity

Created: Testing for Stationarity in Time Series Data

View Notebook on Kaggle

• Trend
• Seasonality
• Irregularity
• Cyclicality

# When not to use Time Series Analyis

• Values are constant - it's pointless
• Values are in the form of functions - just use the function

# Stationarity

• Constant mean
• Constant variance
• Autovariance that does not depend on time

A stationary series has a high probability to follow the same pattern in future

## Stationarity Tests

• Rolling Statistics - moving average, moving variance, visualization

## ARIMA

ARIMA is a common model for analysis

The ARIMA model has the following parameters::

• P - Auto Regressive (AR)
• d - Integration (I)
• Q - Moving Average (MA)

# Applying the Above

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

import seaborn as sns

df = pd.read_csv('/kaggle/input/air-passengers/AirPassengers.csv')


Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
df['Month'] = pd.to_datetime(df['Month'], infer_datetime_format=True)
df = df.set_index(['Month'])


#Passengers
Month
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121
sns.lineplot(data=df)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> In the above we can see that there is an upward trend as well as some seasonality

Next, we can check some summary statistics using a rolling mean approach

## Rolling Averages

Note that for the rolling functions we use a window of 12, this is because the data has a seasonality of 12 months

rolling_mean = df.rolling(window=12).mean()
rolling_std = df.rolling(window=12).std()

df_summary = df.assign(Mean=rolling_mean)
df_summary = df_summary.assign(Std=rolling_std)

sns.lineplot(data=df_summary)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> Since the mean and standard deviation are not constant we can conclude that the data is not stationary

The null hypothesis for the test is that the series is non-stationary, we reject it if the resulting probability > 0.05 (or some other threshold)

from statsmodels.tsa.stattools import adfuller

def print_adf(adf):

adf = adfuller(df['#Passengers'])



In the result of the ADF test we can see that the p-value is much higher than 0.05 which means that the data is not stationary

Because the data is non-stationary the next think we need to do is estimate the trend

df_log = np.log(df)

sns.lineplot(data=df_log)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> rolling_mean_log = df_log.rolling(window=12).mean()

df_summary = df_log.assign(Mean=rolling_mean_log)

sns.lineplot(data=df_summary)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> Using the log there is still some residual effect visible, we can try taking a diff:

df_diff = df - rolling_mean

sns.lineplot(data=df_diff)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> rolling_mean_diff = df_diff.rolling(window=12).mean()
rolling_std_diff = df_diff.rolling(window=12).std()

df_summary = df_diff.assign(Mean=rolling_mean_diff)
df_summary = df_summary.assign(Std=rolling_std_diff)

sns.lineplot(data=df_summary)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> adf_diff = adfuller(df_diff.dropna())



We can do the same with the log:

df_diff_log = df_log - rolling_mean_log

sns.lineplot(data=df_diff_log)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> rolling_mean_diff_log = df_diff_log.rolling(window=12).mean()
rolling_std_diff_log = df_diff_log.rolling(window=12).std()

df_summary = df_diff_log.assign(Mean=rolling_mean_diff_log)
df_summary = df_summary.assign(Std=rolling_std_diff_log)

sns.lineplot(data=df_summary)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> adf_diff_log = adfuller(df_diff_log.dropna())



The ADF for the log diff is less than 0.05 so the result is stationary

We can also try a divide using the the original data and the rolling mean:

df_div = df / rolling_mean

sns.lineplot(data=df_div)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> rolling_mean_div = df_div.rolling(window=12).mean()
rolling_std_div = df_div.rolling(window=12).std()

df_summary = df_div.assign(Mean=rolling_mean_div)
df_summary = df_summary.assign(Std=rolling_std_div)

sns.lineplot(data=df_summary)

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes> adf_div = adfuller(df_div.dropna())



The ADF for the division is less than 0.05 so the result is stationary

Next we can try to do a decomposition on the above series since it is stationary:

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(df_div.dropna())

trend = decomposition.trend

sns.lineplot(data=trend.dropna())

<AxesSubplot:xlabel='Month', ylabel='trend'>

<Figure size 432x288 with 1 Axes> seasonal = decomposition.seasonal

sns.lineplot(data=seasonal.dropna())

<AxesSubplot:xlabel='Month', ylabel='seasonal'>

<Figure size 432x288 with 1 Axes> resid = decomposition.resid

sns.lineplot(data=resid.dropna())

<AxesSubplot:xlabel='Month', ylabel='resid'>

<Figure size 432x288 with 1 Axes> 