Forecasting is a technique that uses historical data to make informed estimates that are predictive.
It is an important task in an organization these days.
What is Time series data?
As a name suggests, time-series data is data that is collected at constant intervals. The intervals could be daily, weekly, monthly or even hourly.
The time-series data is dependent on time. So the basic assumption of the linear regression model doesn't apply here i.e the observations are independent. In time series each observation is dependent on the previous observation.
Time series data also has a component called Trend. The trend could be either increasing or decreasing. And most of the time series have some form of seasonality.
How to determine if the data is time-series data?
Usually, the time-series data could be identified through visual aids. But there are some statistical ways to identify the time series. One of the methods is to check stationary of a time series.
Stationary Series.
It is easy to predict if the time series is stationary. Stationary series implies, mean, variance and co-variance are constant with time. There are ways to convert non-stationary series to stationary series.
Augment Dickey-Fuller(ADF) test is the most popular test to check the stationarity of a series. This test determines the presence of unit root in the series.
Null Hypothesis: The series has a unit root
Alternate Hypothesis: The series does not have a unit root.
The ADF tests give the following results. (Note: The results might vary to the case to case)
In the above output, test statistic < critical value, which implies that time series is stationary.
Another popular test, to confirm the stationarity of the series is Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.
Null Hypothesis: The process is trend stationery
Alternate Hypothesis: The series has a unit root
if the test statistic is greater than the critical value then we could reject the null hypothesis. Meaning, series is not stationary.
if the test statistic is less than critical value then we fail to reject the null hypothesis. Meaning, series is stationary.
In the above results, test statistics (0.0354) < critical value (0.73900), then we fail to reject the null hypothesis. KPSS test identifies trend stationarity in a series
It is recommended to perform both the tests before proceeding to model time series data. Sometimes both tests show contradictory results. This could happen because the ADF test has an alternate hypothesis of linear or difference stationary whereas KPSS test identifies trend stationarity in a series.
How many types of stationarity exists?
Strict Stationary: A strict stationary series is the one that satisfies the mathematical condition of mean, variance and co-variance are not a function of time.
Trend Stationary: The mean trend is deterministic. It has no unit root. It has a trend component. The trend component needs to be removed to convert non-stationary series to stationary series.
Difference Stationary: The mean trend is stochastic. A time series that can be converted to stationary by differencing. when d differences are required to make a series stationary, then that series is called integrated order of D denoted by I(D).
Deterministic trend: Always revert to the trend in the long term. Meaning, the effect of changes are eventually eliminated. The forecast intervals have a constant width.
Stochastic trend: The series never recover from the shock. The intervals of forecast grow over time.
How to convert non-stationary series to stationary series?
We understood that it is critical to converting non-stationary series to stationary series. There are two major reasons for non-stationarity behaviors.
They are trend and seasonality.
The trend is defined as a varying mean over time.
Seasonality is defined as a variation at a specific time-intervals.
There are ways to actually eliminate trend and seasonality. The underlying principle is to quantify the trend and seasonality in the series and remove them to convert it to stationary series. After modeling the time series data, add back the trend and seasonality to the forecasted values.
Removing trend:
1. Transformation:
The quick way to reduce the trend is transformation. The transformation penalizes the higher values compared to the smaller values. Some of the transformations are
Logarithmic transformation
Square root transformation
Cube root
2. Aggregation:
Averaging over a time period like monthly, weekly
3. Smoothing
Rolling averages like moving average, exponentially weighted moving average
Moving Average:
The average value of n consecutive numbers. This can be easily evaluated through pandas.
Exponentially Weighted Moving Average
The weights are assigned to the values with the heavyweight to the previous one and least weight to the last one. This can be easily evaluated through pandas
4. Polynomial Fitting
In this method, we fit a regression model.
Removing Seasonality
1. Differencing: We calculate the difference between the two consecutive values. For example, an observation taken in January is subtracted with the observation taken on the previous January.
2. Decomposition: Thanks to stats model. One can easily decompose trend and seasonality.
Now, we converted the non-stationary series to stationary series. We can start modeling the time series data.
There are different ways to model time series data. Here are a few.
2. Moving Average
3. Auto Regressive Moving Average
4. Auto Regressive Integrated Moving Average
5. Seasonal Auto Regressive integrated Moving Average
6. Seasonal Auto Regressive integrated Moving Average with Exogenous Regressors
7. Vector Auto Regression
8. Vector Auto Regression Moving Average
9. Vector Auto Regression Moving Average with Exogenous Regressors
10. Simple Exponential Smoothing
11. Holt Winter's Exponential Smoothing
12. FBProphet
Comments