ARIMA

ARIMA models describe phenomena that evolve through time, predict future values and stand for AutoRegressive Integrated Moving Average. Run them in Excel using the XLSTAT add-on statistical software.

XLSTAT offers a wide selection of ARIMA models such as ARMA (Autoregressive Moving Average), ARIMA (Autoregressive Integrated Moving Average) or SARIMA (Seasonal Autoregressive Integrated Moving Average). This way, you can easily run an ARIMA for time series forecasting without python or R. These models can be used in applied machine learning in various fields such as finance, to predict the evolution of stock price, but also in meteorology to predict temperatures.

How does ARIMA work?

The models of the ARIMA family allow to represent in a synthetic way phenomena that vary with time, and to predict future values with a confidence interval around the predictions. They are adapted specifically for time series data more than a classical linear regression model.

The mathematical writing of the ARIMA models differs from one author to the other. The differences concern most of the time the sign of the coefficients. XLSTAT is using the most commonly found writing, used by most softwares. If we define by Xt a series with mean µ, then if the series is supposed to follow an ARIMA(p,d,q)(P,D,Q)s model, we can write:

[ Yt = (1 – B)d (1 – Bs)D Xt - µ ; Φ(B)Ø(Bs))Yt = θ(B) Θ(Bs) Zt, Zt∞N(0,σ2) ]

with

[ Φ(z) = 1 – Σpi=1 Φi zi, Ø(z)= 1 – Σpi=1 Øi zi ; θ(z) = 1 + Σqi=1 θi zi, Θ(z) = 1 + Σqi=1 Θi zi ]

p is the order of the autoregressive part of the model. q is the order of the moving average part of the model. d is the differencing order of the model. D is the differencing order of the seasonal part of the model. s is the period of the model (for example 12 if the data are monthly data, and if one noticed a yearly periodicity in the data). P is the order of the autoregressive seasonal part of the model. Q is the order of the moving average seasonal part of the model.

Remark 1: the Yt process is causal if and only if for any z such that |z|≤1, f(z)≠0 and q(z)≠0.
Remark 2: if D=0, the model is an ARIMA(p,d,q) model. In that case, P, Q and s are considered as null.
Remark 3: if d=0 and D=0, the model simplifies to an ARMA(p,q) model. In this case, we can state that the series is stationary.
Remark 4: if d=0, D=0 and q=0, the model simplifies to an AR(p) model.
Remark 5: if d=0, D=0 and p=0, the model simplifies to an MA(q) model.

Explanatory variables

XLSTAT allows you to take into account explanatory variables through a linear model. Three different approaches are possible:

OLS: A linear regression model is fitted using the classical linear regression approach, then the residuals are modeled using an (S)ARIMA model.
CO-LS: If d or D and s are not zero, the data (including the explanatory variables) are differenced, then the corresponding ARMA model is fitted at the same time as the linear model coefficients using the Cochrane and Orcutt (1949) approach.
GLS: A linear regression model is fitted, then the residuals are modeled using an (S)ARIMA model, then we loop back to the regression step, in order to improve the likelihood of the model by changing the regression coefficients using a Newton-Raphson approach.

Note: if no differencing is requested (d=0 and D=0), and if there are no explanatory variables in the model, the constant of the model is estimated using CO-LS.

Results of the ARIMA analysis in XLSTAT

Let’s see how to interpret ARIMA model results.

Summary statistics: This table displays for the selected variables, the number of observations, the number of missing values, the number of non- missing values, the mean and the standard deviation (unbiased).

If a preliminary estimation and an optimization have been requested the results for the preliminary estimation are first displayed followed by the results after the optimization. If initial coefficients have been entered the results corresponding to these coefficients are displayed first.

Goodness of fit coefficients:

Observations: The number of data used for the fitting of the model.
SSE: Sum of Squares of Errors. This statistic is minimized if the "Least Squares" option has been selected for the optimization.
MAPE: The Mean Absolute Percentage Error measures the quality of the fit, while removing the scale effect and not relatively penalizing bigger errors.
WN variance: The white noise variance is equal to the SSE divided by N. In some software, this statistic is named sigma2 (sigma-square).
WN variance estimate: This statistic is usually equal to the previous. In the case of a preliminary estimation using the Yule-Walker or Burg’s algorithms, a slightly different estimate is displayed.
-2Log(Like.): This statistic is minimized if the "Likelihood" option has been selected for the optimization. It is equal to 2 times the natural logarithm of the likelihood.
FPE: Akaike’s Final Prediction Error. This criterion is adapted to autoregressive models.
AIC: The Akaike Information Criterion.
AICC: This criterion has been suggested by Brockwell (Akaike Information Criterion Corrected).
SBC: Schwarz’s Bayesian Criterion.

Model parameters:

The first table of parameters gives the coefficients of the linear model fitted to the data (a constant if no explanatory variable was selected).

The next table gives the estimator for each coefficient of each polynomial, as well as the standard deviation obtained either directly from the estimation method (preliminary estimation), or from the Fisher’s information matrix (Hessian). The asymptotical standard deviations are also computed. For each coefficient and each standard deviation, a confidence interval is displayed. The coefficients are identified as follows:

AR(i): that corresponds to the order i coefficient of the f(z) polynomial.
SAR(i): coefficient that corresponds to the order i coefficient of the F(z) polynomial.
MA(i): coefficient that corresponds to the order i coefficient of the q(z) polynomial.
SMA(i): coefficient that corresponds to the order i coefficient of the Q(z) polynomial.

Data, Predictions and Residuals: This table displays the data, the corresponding predictions computed with the arima model, and the residuals. If the user requested it, predictions are computed for the validation data and forecasts for future values. Standard deviations and confidence intervals are computed for validation predictions and forecasts.

Charts: Two charts are displayed. The first chart displays the data, the corresponding values predicted by the model, and the predictions corresponding to the values for the validation and/or prediction time steps. The second chart corresponds to the bar chart of residuals.