cv.odpc: Automatic Choice of Tuning Parameters for One-Sided Dynamic Principal Components via Cross-Validation

Description

Computes One-Sided Dynamic Principal Components, choosing the number of components and lags automatically, to minimize an estimate of the forecasting mean squared error.

Usage

cv.odpc(
  Z,
  h,
  k_list = 1:5,
  max_num_comp = 5,
  window_size,
  ncores_k = 1,
  ncores_w = 1,
  method,
  tol = 1e-04,
  niter_max = 500,
  train_tol = 0.01,
  train_niter_max = 100
)

Arguments

Data matrix. Each column is a different time series.

Forecast horizon.

k_list

List of values of k to choose from.

max_num_comp

Maximum possible number of components to compute.

window_size

The size of the rolling window used to estimate the forecasting error.

ncores_k

Number of cores to parallelise over k_list.

ncores_w

Number of cores to parallelise over the rolling window (nested in k_list).

method

A string specifying the algorithm used. Options are 'ALS', 'mix' or 'gradient'. See details in odpc.

tol

Relative precision. Default is 1e-4.

niter_max

Integer. Maximum number of iterations. Default is 500.

train_tol

Relative precision used in cross-validation. Default is 1e-2.

train_niter_max

Integer. Maximum number of iterations used in cross-validation. Default is 100.

Value

An object of class odpcs, that is, a list of length equal to the number of computed components, each computed using the optimal value of k. The i-th entry of this list is an object of class odpc, that is, a list with entries

Coordinates of the i-th dynamic principal component corresponding to the periods $k_1 + 1,\dots,T$.

mse

Mean squared error of the reconstruction using the first i components.

Number of lags used to define the i-th dynamic principal component f.

Number of lags of f used to reconstruct.

alpha

Vector of intercepts corresponding to f.

Vector that defines the i-th dynamic principal component

Matrix of loadings corresponding to f. Row number $k$ is the vector of $k-1$ lag loadings.

call

The matched call.

conv

Logical. Did the iterations converge?

components, fitted, plot and print methods are available for this class.

Details

We assume that for each component $k_{1}^{i}=k_{2}^{i}$, that is, the number of lags of $\mathbf{z}_{t}$ used to define the dynamic principal component and the number of lags of $\widehat{f}^{i}_{t}$ used to reconstruct the original series are the same. The number of components and lags is chosen to minimize the cross-validated forecasting error in a stepwise fashion. Suppose we want to make $h$-steps ahead forecasts. Let $w=$ window_size. Then given $k\in$ k_list we compute the first ODPC defined using $k$ lags, using periods $1,\dots,T-h-t+1$ for $t=1,\dots,w$, and for each of these fits we compute an h-steps ahead forecast and the corresponding mean squared error $E_{t,h}$. The cross-validation estimate of the forecasting error is then $$ \widehat{MSE}_{1,k}=\frac{1}{w}\sum\limits_{t=1}^{w}E_{t,h}. $$ We choose for the first component the value $k^{\ast,1}$ that minimizes $\widehat{MSE}_{1,k}$. Then, we fix the first component computed with $k^{\ast,1}$ lags and repeat the procedure with the second component. If the optimal cross-validated forecasting error using the two components, $\widehat{MSE}_{2,k^{\ast,2}}$ is larger than the one using only one component, $\widehat{MSE}_{1,k^{\ast,1}}$, we stop and output as a final model the ODPC computed using one component defined with $k^{\ast,1}$ lags; otherwise, if max_num_comp $\geq 2$ we add the second component defined using $k^{\ast,2}$ lags and proceed as before.

This method can be computationally costly, especially for large values of the window_size. Ideally, the user should set n_cores_k equal to the length of k_list and n_cores_w equal to window_size; this would entail using n_cores_k times n_cores_w cores in total.

References

Pe<U+00F1>a D., Smucler E. and Yohai V.J. (2017). <U+201C>Forecasting Multiple Time Series with One-Sided Dynamic Principal Components.<U+201D> Available at https://arxiv.org/abs/1708.04705.

Examples

Run this code

# NOT RUN {
T <- 50 #length of series
m <- 10 #number of series
set.seed(1234)
f <- rnorm(T + 1)
x <- matrix(0, T, m)
u <- matrix(rnorm(T * m), T, m)
for (i in 1:m) {
  x[, i] <- 10 * sin(2 * pi * (i/m)) * f[1:T] + 10 * cos(2 * pi * (i/m)) * f[2:(T + 1)] + u[, i]
}
# Choose parameters to perform a one step ahead forecast. Use 1 or 2 lags, only one component 
# and a window size of 2 (artificially small to keep computation time low). Use two cores for the
# loop over k, two cores for the loop over the window
fit <- cv.odpc(x, h=1, k_list = 1:2, max_num_comp = 1, window_size = 2, ncores_k = 2, ncores_w = 2)
# }

Run the code above in your browser using DataLab