# Time Series Stationarity

Created: Testing for Stationarity in Time Series Data

# Components of Time Series data

- Trend
- Seasonality
- Irregularity
- Cyclicality

# When not to use Time Series Analyis

- Values are constant - it's pointless
- Values are in the form of functions - just use the function

# Stationarity

- Constant mean
- Constant variance
- Autovariance that does not depend on time

A stationary series has a high probability to follow the same pattern in future

## Stationarity Tests

- Rolling Statistics - moving average, moving variance, visualization
- ADCF Test

## ARIMA

ARIMA is a common model for analysis

The ARIMA model has the following parameters::

- P - Auto Regressive (AR)
- d - Integration (I)
- Q - Moving Average (MA)

# Applying the Above

```
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
```

```
import seaborn as sns
```

```
df = pd.read_csv('/kaggle/input/air-passengers/AirPassengers.csv')
df.head()
```

Month | #Passengers | |
---|---|---|

0 | 1949-01 | 112 |

1 | 1949-02 | 118 |

2 | 1949-03 | 132 |

3 | 1949-04 | 129 |

4 | 1949-05 | 121 |

```
df['Month'] = pd.to_datetime(df['Month'], infer_datetime_format=True)
df = df.set_index(['Month'])
df.head()
```

#Passengers | |
---|---|

Month | |

1949-01-01 | 112 |

1949-02-01 | 118 |

1949-03-01 | 132 |

1949-04-01 | 129 |

1949-05-01 | 121 |

```
sns.lineplot(data=df)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

In the above we can see that there is an upward trend as well as some seasonality

Next, we can check some summary statistics using a rolling mean approach

## Rolling Averages

Note that for the rolling functions we use a window of 12, this is because the data has a seasonality of 12 months

```
rolling_mean = df.rolling(window=12).mean()
rolling_std = df.rolling(window=12).std()
df_summary = df.assign(Mean=rolling_mean)
df_summary = df_summary.assign(Std=rolling_std)
sns.lineplot(data=df_summary)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

Since the mean and standard deviation are not constant we can conclude that the data is not stationary

## ADF Test

The null hypothesis for the test is that the series is non-stationary, we reject it if the resulting probability > 0.05 (or some other threshold)

```
from statsmodels.tsa.stattools import adfuller
```

```
def print_adf(adf):
print('ADF test statistic', adf[0])
print('p-value', adf[1])
print('Lags used', adf[2])
print('Observations used', adf[3])
print('Critical values', adf[4])
```

```
adf = adfuller(df['#Passengers'])
print_adf(adf)
```

In the result of the ADF test we can see that the p-value is much higher than 0.05 which means that the data is not stationary

Because the data is non-stationary the next think we need to do is estimate the trend

```
df_log = np.log(df)
sns.lineplot(data=df_log)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
rolling_mean_log = df_log.rolling(window=12).mean()
df_summary = df_log.assign(Mean=rolling_mean_log)
sns.lineplot(data=df_summary)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

Using the log there is still some residual effect visible, we can try taking a diff:

```
df_diff = df - rolling_mean
sns.lineplot(data=df_diff)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
rolling_mean_diff = df_diff.rolling(window=12).mean()
rolling_std_diff = df_diff.rolling(window=12).std()
df_summary = df_diff.assign(Mean=rolling_mean_diff)
df_summary = df_summary.assign(Std=rolling_std_diff)
sns.lineplot(data=df_summary)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
adf_diff = adfuller(df_diff.dropna())
print_adf(adf_diff)
```

We can do the same with the log:

```
df_diff_log = df_log - rolling_mean_log
sns.lineplot(data=df_diff_log)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
rolling_mean_diff_log = df_diff_log.rolling(window=12).mean()
rolling_std_diff_log = df_diff_log.rolling(window=12).std()
df_summary = df_diff_log.assign(Mean=rolling_mean_diff_log)
df_summary = df_summary.assign(Std=rolling_std_diff_log)
sns.lineplot(data=df_summary)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
adf_diff_log = adfuller(df_diff_log.dropna())
print_adf(adf_diff_log)
```

The ADF for the log diff is less than 0.05 so the result is stationary

We can also try a divide using the the original data and the rolling mean:

```
df_div = df / rolling_mean
sns.lineplot(data=df_div)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
rolling_mean_div = df_div.rolling(window=12).mean()
rolling_std_div = df_div.rolling(window=12).std()
df_summary = df_div.assign(Mean=rolling_mean_div)
df_summary = df_summary.assign(Std=rolling_std_div)
sns.lineplot(data=df_summary)
```

```
<AxesSubplot:xlabel='Month'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
adf_div = adfuller(df_div.dropna())
print_adf(adf_div)
```

The ADF for the division is less than 0.05 so the result is stationary

Next we can try to do a decomposition on the above series since it is stationary:

```
from statsmodels.tsa.seasonal import seasonal_decompose
```

```
decomposition = seasonal_decompose(df_div.dropna())
```

```
trend = decomposition.trend
sns.lineplot(data=trend.dropna())
```

```
<AxesSubplot:xlabel='Month', ylabel='trend'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
seasonal = decomposition.seasonal
sns.lineplot(data=seasonal.dropna())
```

```
<AxesSubplot:xlabel='Month', ylabel='seasonal'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
resid = decomposition.resid
sns.lineplot(data=resid.dropna())
```

```
<AxesSubplot:xlabel='Month', ylabel='resid'>
```

```
<Figure size 432x288 with 1 Axes>
```

```
```