arimaFit#
Purpose#
Fit ARIMA, SARIMA, or ARIMAX models. Automatically selects orders when order is not specified.
Format#
- result = arimaFit(y)#
- result = arimaFit(y, order=p|d|q)
- result = arimaFit(y, order=p|d|q, sorder=P|D|Q, season=s)
- result = arimaFit(y, xreg=X)
- result = arimaFit(y, ctl)
- Parameters:
y (Nx1 vector) – time series data.
ctl (struct) –
Optional input, an instance of an
arimaControlstructure. An instance is initialized by callingarimaControlCreate()and the following members can be set:ctl.max_p
Scalar, maximum AR order for auto-selection. Default = 5.
ctl.max_q
Scalar, maximum MA order for auto-selection. Default = 5.
ctl.max_d
Scalar, maximum differencing order. Default = 2.
ctl.max_bp
Scalar, maximum seasonal AR order. Default = 2.
ctl.max_bq
Scalar, maximum seasonal MA order. Default = 2.
ctl.max_bd
Scalar, maximum seasonal differencing order. Default = 1.
ctl.max_order
Scalar, maximum total order (p+q+P+Q). Default = 5.
ctl.ic
String, information criterion for auto-selection.
"aicc"Corrected Akaike. (Default)
"aic"Akaike.
"bic"Bayesian (Schwarz).
ctl.stepwise
Scalar, search strategy for auto-selection.
1
Stepwise search (faster, recommended). (Default)
0
Exhaustive search (slower, guaranteed global optimum).
ctl.method
String, estimation method.
"css-ml"CSS for starting values, then ML refinement. (Default)
"ml"Maximum likelihood only.
ctl.include
String, deterministic terms.
"auto"Automatic: mean if d=0, drift if d=1, none if d>=2. (Default)
"mean"Force include mean.
"drift"Force include drift.
"none"No deterministic term.
ctl.quiet
Scalar, output control. Set to 1 to suppress printed output. Default = 0.
ctl.max_iter
Scalar, maximum optimizer iterations. Default = 1000.
ctl.tol
Scalar, convergence tolerance. Default = 1e-8.
order (3x1 vector {p, d, q}) – Optional keyword, ARIMA order. If omitted, orders are automatically selected by minimizing the information criterion specified in ctl.ic. Individual elements may be set to -1 to auto-select that dimension only. E.g.,
order = -1|1|-1fixes d=1 and auto-selects p and q.sorder (3x1 vector {P, D, Q}) – Optional keyword, seasonal ARIMA order. If omitted with season set, seasonal orders are auto-selected.
season (scalar) – Optional keyword, seasonal period (e.g., 12 for monthly, 4 for quarterly). Required for seasonal models.
xreg (NxM matrix) – Optional keyword, exogenous regressors. Fits a regression with ARIMA errors: \(y_t = X_t'\beta + \eta_t\) where \(\eta_t\) follows an ARIMA process.
xreg_names (Mx1 string array) – Optional keyword, column names for xreg. If omitted, defaults to
"X1","X2", etc.quiet (scalar) – Optional keyword, set to 1 to suppress printed output. Overrides ctl.quiet.
- Returns:
result (struct) –
An instance of an
arimaResultstructure containing:result.order
3x1 vector, estimated or specified ARIMA order {p, d, q}.
result.sorder
3x1 vector, seasonal order {P, D, Q}. Empty matrix if non-seasonal.
result.season
Scalar, seasonal period. 0 if non-seasonal.
result.include_mean
Scalar, 1 if mean/drift was included, 0 otherwise.
result.coefs
Kx1 vector, estimated coefficients in order: AR, MA, SAR, SMA, Mean/Drift, Xreg.
result.se
Kx1 vector, standard errors.
result.tstat
Kx1 vector, t-statistics.
result.pval
Kx1 vector, two-sided p-values.
result.ci_lower
Kx1 vector, lower 95% confidence bounds.
result.ci_upper
Kx1 vector, upper 95% confidence bounds.
result.coef_names
Kx1 string array, coefficient labels (e.g.,
"AR(1)","MA(1)","Mean").result.sigma2
Scalar, estimated innovation variance.
result.loglik
Scalar, maximized log-likelihood.
result.aic
Scalar, Akaike information criterion.
result.aicc
Scalar, corrected Akaike information criterion.
result.bic
Scalar, Bayesian information criterion (Schwarz).
result.residuals
Nx1 vector, standardized residuals.
result.fitted
Nx1 vector, in-sample fitted values.
result.n_obs
Scalar, number of observations used in estimation.
result.converged
Scalar, 1 if optimizer converged, 0 otherwise.
result.y
Nx1 vector, original series (stored for use by
arimaForecast()).result.xreg
NxM matrix, original exogenous regressors. Empty matrix if none.
result.ar_coefs
Vector, expanded AR polynomial coefficients (used internally by
arimaForecast()).result.ma_coefs
Vector, expanded MA polynomial coefficients (used internally by
arimaForecast()).result.xreg_coefs
Mx1 vector, regression coefficients. Empty matrix if no regressors.
result.intercept
Scalar, mean or drift value. Missing if none.
result.vcov
KxK matrix, variance-covariance matrix of estimated coefficients.
Model#
ARIMA(p,d,q): After differencing \(d\) times, the series \(w_t = (1-L)^d y_t\) follows a stationary ARMA(p,q) process:
where \(\varepsilon_t \sim N(0, \sigma^2)\). In backshift operator notation:
SARIMA(p,d,q)(P,D,Q)[s]: Adds seasonal differencing and seasonal ARMA terms:
where \(\Phi(L^s) = 1 - \Phi_1 L^s - \cdots - \Phi_P L^{Ps}\) and \(\Theta(L^s) = 1 + \Theta_1 L^s + \cdots + \Theta_Q L^{Qs}\).
ARIMAX (regression with ARIMA errors): When exogenous regressors \(X_t\) are provided:
This is a regression with ARIMA errors model (Hyndman & Athanasopoulos 2021, Ch. 10), not a transfer function model. The distinction matters: the AR/MA structure applies to the regression residuals, not directly to \(y_t\). Algorithm ———
Estimation (CSS-ML):
Conditional sum of squares (CSS): Condition on the first \(\max(p, s \cdot P)\) observations and minimize the sum of squared one-step-ahead prediction errors. This provides fast initial parameter estimates. Complexity: \(O(N)\).
Maximum likelihood refinement (ML): Starting from the CSS estimates, maximize the exact Gaussian log-likelihood via the state-space representation and Kalman filter. The likelihood is:
\[\log \mathcal{L} = -\frac{N}{2} \log(2\pi) - \frac{1}{2} \sum_{t=1}^{N} \left( \log f_t + \frac{v_t^2}{f_t} \right)\]where \(v_t\) and \(f_t\) are the innovation and its variance from the Kalman filter. Optimization uses L-BFGS-B with parameter transforms to enforce stationarity and invertibility.
Auto-selection (stepwise):
When order is omitted, the Hyndman-Khandakar (2008) stepwise algorithm is used:
Determine \(d\) via KPSS unit root tests (Kwiatkowski et al. 1992).
Determine \(D\) via OCSB seasonal unit root tests (Osborn, Chui, Smith & Birchenhall 1988), if seasonal.
Fit an initial model, then search neighboring orders in a stepwise fashion, minimizing AICc (default) or the criterion in ctl.ic.
Total models evaluated is typically 15-30 (vs. hundreds for exhaustive search).
Set ctl.stepwise = 0 for exhaustive search over all \((p, q, P, Q)\) combinations up to ctl.max_order.
Examples
——–
Auto ARIMA on Airline Passengers#
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/airline.dat"), "passengers");
// Automatic ARIMA — selects order via AICc
result = arimaFit(y);
Output:
================================================================================
Model: ARIMA(1,1,0) Observations: 144
Method: CSS-ML Log-Likelihood: -504.920
AIC: 1015.84 AICc: 1016.02
BIC: 1024.78 Sigma^2: 132.428
================================================================================
Coef Std.Err. t-stat p-value [0.025 0.975]
--------------------------------------------------------------------------------
AR(1) 0.3185 0.0832 3.828 0.000 0.156 0.481
Drift 2.6672 0.7781 3.428 0.001 1.142 4.193
================================================================================
Ljung-Box(10): Q=65.21 p=0.000
Jarque-Bera: JB=1.83 p=0.401
================================================================================
Seasonal ARIMA on Monthly Data#
The classic Box-Jenkins airline model — SARIMA(0,1,1)(0,1,1)[12]:
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/airline.dat"), "passengers");
// Auto SARIMA with season=12
result = arimaFit(y, 12);
Selects SARIMA(0,1,1)(0,1,1)[12] — the same model identified by Box & Jenkins (1970) on this dataset. The seasonal MA(1) coefficient captures the within-year pattern, while regular differencing and seasonal differencing handle trend and annual cycles.
Fixed Order with Diagnostics#
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/airline.dat"), "passengers");
// Force SARIMA(1,1,1)(0,1,1)[12]
result = arimaFit(y, 12, 1, 1, 1, 0, 1, 1);
Check the Ljung-Box statistic for residual autocorrelation: p > 0.05 indicates no remaining serial correlation. Check Jarque-Bera for normality: p > 0.05 indicates Gaussian residuals.
ARIMAX: GDP with Leading Indicators#
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/macro.dat"), "gdp");
X = loadd(getGAUSSHome("pkgs/timeseries/examples/macro.dat"), "cpi + ffr");
// Regression with ARIMA errors
result = arimaFit(y, xreg=X, xreg_names="CPI"$|"FFR");
The exogenous coefficients are reported alongside the ARIMA parameters.
Use arimaForecast() with xreg=X_future to produce forecasts conditional
on projected regressor values.
Partial Auto-Selection#
Fix \(d = 1\) but auto-select \(p\) and \(q\):
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/airline.dat"), "passengers");
// Fix d=1, auto-select p and q
// season=12, fix d=1, auto-select p and q (use -1)
result = arimaFit(y, 12, -1, 1, -1);
Using BIC for Model Selection#
new;
library timeseries;
y = loadd(getGAUSSHome("pkgs/timeseries/examples/airline.dat"), "passengers");
ctl = arimaControlCreate();
ctl.ic = "bic";
result = arimaFit(y, ctl, season=12);
BIC penalizes model complexity more than AICc, typically selecting more parsimonious models. Troubleshooting —————
Optimizer did not converge:
Increase ctl.max_iter (default 1000) or switch to ctl.method = "css" for a
quick approximate estimate. Non-convergence often indicates the model is
over-parameterized for the data — try a simpler order.
Near-unit-root MA coefficient: If \(|\theta_q|\) or \(|\Theta_Q|\) is close to 1.0, the model is near the boundary of invertibility. The MA polynomial is nearly non-invertible, which causes numerical issues. Solutions:
Increase the differencing order by 1 (replace MA near-unit-root with a difference).
Use
ctl.method = "ml"instead of"css-ml"— CSS initialization can sometimes find a near-boundary starting point that ML refinement cannot escape.
CSS and ML give different answers:
CSS conditions on initial values and optimizes a different objective than exact ML.
Small discrepancies (< 0.02 in coefficients) are normal. Large discrepancies suggest
the likelihood surface has multiple modes — try ctl.method = "ml" with different
starting values, or simplify the model.
Auto-selection picks a simple model when you expected a complex one:
AICc penalizes complexity. If you believe a more complex model is correct, fix the
order explicitly with order=p|d|q and compare the diagnostic statistics. Remember
that more parsimonious models often forecast better even when the true DGP is complex
(Hyndman & Athanasopoulos 2021, Section 8.6).
Residual autocorrelation (Ljung-Box p < 0.05): The model has not captured all the serial dependence. Try:
Increasing the AR order (higher p).
Adding seasonal terms if the data has a seasonal pattern.
Adding exogenous regressors if there is an omitted variable.
Remarks#
Auto-selection algorithm:
When order is omitted, arimaFit() performs stepwise model selection
(Hyndman & Khandakar 2008) minimizing the information criterion specified
in ctl.ic. The algorithm considers up to ctl.max_order total ARMA terms
(p+q+P+Q). Set ctl.stepwise = 0 for exhaustive search.
Deterministic terms (include logic):
The default ctl.include = "auto" automatically determines whether to
include a constant:
d |
D |
Behavior |
Label |
|---|---|---|---|
0 |
0 |
Include mean of stationary series |
Mean |
1 |
0 |
Include drift (linear trend) |
Drift |
0 |
>= 1 |
No constant |
|
>= 2 |
any |
No constant |
This matches the behavior of R’s auto.arima (Hyndman & Khandakar 2008).
Estimation method:
The default "css-ml" method uses conditional sum of squares to obtain
starting values, then refines via maximum likelihood. Use "ml" for
direct ML optimization (may be slower but avoids CSS approximation issues
with some models).
Coefficient ordering: Coefficients in result.coefs are ordered: AR(1), …, AR(p), MA(1), …, MA(q), SAR(1), …, SAR(P), SMA(1), …, SMA(Q), Mean/Drift (if present), X1, …, Xm (if xreg). The result.coef_names string array provides labels in the same order. Verification ————
Verified against three independent reference implementations:
R ``forecast`` package (Hyndman et al.): Cross-validated on 15+ classic time series datasets (Nile, AirPassengers, USAccDeaths, CO2, LakeHuron, WWWusage, sunspot.year, nottem, UKgas, JohnsonJohnson, austres, lynx) covering ARIMA, SARIMA, and ARIMAX models. Tests verify coefficients, standard errors, log-likelihood, information criteria, and forecasts.
Python ``statsmodels`` SARIMAX: Same datasets and model specifications re-verified against Python’s state-space SARIMAX implementation. Coefficients match to \(10^{-4}\) on most models.
Julia: Additional cross-validation on a subset of datasets against Julia time series packages.
Unit root tests:
KPSS test verified against R urca::ur.kpss() on 5 datasets. OCSB seasonal unit
root test (nsdiffs) verified against R forecast::nsdiffs() on 4 seasonal series.
Total: 65 passing tests across R, Python, and Julia references.
See gausslib-ts/tests/r_regression.rs.
References
———-
Box, G.E.P. and G.M. Jenkins (1970). Time Series Analysis: Forecasting and Control. Holden-Day.
Brockwell, P.J. and R.A. Davis (2002). Introduction to Time Series and Forecasting. 2nd ed., Springer.
Hyndman, R.J. and Y. Khandakar (2008). “Automatic time series forecasting: The forecast package for R.” Journal of Statistical Software, 27(3).
Hyndman, R.J. and G. Athanasopoulos (2021). Forecasting: Principles and Practice. 3rd ed., OTexts.
Kwiatkowski, D., P.C.B. Phillips, P. Schmidt, and Y. Shin (1992). “Testing the null hypothesis of stationarity against the alternative of a unit root.” Journal of Econometrics, 54(1-3), 159-178.
Library#
timeseries
Source#
arima.src
See also
Functions arimaForecast(), arimaControlCreate(), arimaResults(), arimaCoefTable(), stlDecompose()