CVar: k-fold Cross-Validation applied to an autoregressive model

Description

CVar computes the errors obtained by applying an autoregressive modelling function to subsets of the time series y using k-fold cross-validation as described in Bergmeir, Hyndman and Koo (2015). It also applies a Ljung-Box test to the residuals. If this test is significant (see returned pvalue), there is serial correlation in the residuals and the model can be considered to be underfitting the data. In this case, the cross-validated errors can underestimate the generalization error and should not be used.

Usage

CVar(
  y,
  k = 10,
  FUN = nnetar,
  cvtrace = FALSE,
  blocked = FALSE,
  LBlags = 24,
  ...
)

Arguments

Univariate time series

Number of folds to use for cross-validation.

FUN

Function to fit an autoregressive model. Currently, it only works with the nnetar function.

cvtrace

Provide progress information.

blocked

choose folds randomly or as blocks?

LBlags

lags for the Ljung-Box test, defaults to 24, for yearly series can be set to 20

...

Other arguments are passed to FUN.

Value

A list containing information about the model and accuracy for each fold, plus other summary information computed across folds.

References

Bergmeir, C., Hyndman, R.J., Koo, B. (2018) A note on the validity of cross-validation for evaluating time series prediction. Computational Statistics & Data Analysis, 120, 70-83. https://robjhyndman.com/publications/cv-time-series/.

Examples

Run this code

# NOT RUN {
modelcv <- CVar(lynx, k=5, lambda=0.15)
print(modelcv)
print(modelcv$fold1)

library(ggplot2)
autoplot(lynx, series="Data") +
  autolayer(modelcv$testfit, series="Fits") +
  autolayer(modelcv$residuals, series="Residuals")
ggAcf(modelcv$residuals)

# }