ccm: Perform convergent cross mapping using simplex projection

Description

ccm uses time delay embedding on one time series to generate an attractor reconstruction, and then applies the simplex projection algorithm to estimate concurrent values of another time series. This method is typically applied, varying the library sizes, to determine if one time series contains the necessary dynamic information to recover the influence of another, causal variable.

Usage

ccm(block, lib = c(1, NROW(block)), pred = lib, norm_type = c("L2 norm",
  "L1 norm", "LP norm"), P = 0.5, E = 1, tau = 1, tp = 0,
  num_neighbors = "e+1", lib_sizes = seq(10, 100, by = 10),
  random_libs = TRUE, num_samples = 100, replace = TRUE, lib_column = 1,
  target_column = 2, first_column_time = FALSE, RNGseed = NULL,
  exclusion_radius = NULL, epsilon = NULL, silent = FALSE)

Arguments

block

either a vector to be used as the time series, or a data.frame or matrix where each column is a time series

lib

a 2-column matrix (or 2-element vector) where each row specifes the first and last *rows* of the time series to use for attractor reconstruction

pred

(same format as lib), but specifying the sections of the time series to forecast.

norm_type

the distance function to use. see 'Details'

the exponent for the P norm

the embedding dimensions to use for time delay embedding

tau

the lag to use for time delay embedding

the prediction horizon (how far ahead to forecast)

num_neighbors

the number of nearest neighbors to use (any of "e+1", "E+1", "e + 1", "E + 1" will peg this parameter to E+1 for each run, any value < 1 will use all possible neighbors.)

lib_sizes

the vector of library sizes to try

random_libs

indicates whether to use randomly sampled libs

num_samples

is the number of random samples at each lib size (this parameter is ignored if random_libs is FALSE)

replace

indicates whether to sample vectors with replacement

lib_column

the index (or name) of the column to cross map from

target_column

the index (or name) of the column to cross map to

first_column_time

indicates whether the first column of the given block is a time column (and therefore excluded when indexing)

RNGseed

will set a seed for the random number generator, enabling reproducible runs of ccm with randomly generated libraries

exclusion_radius

excludes vectors from the search space of nearest neighbors if their *time index* is within exclusion_radius (NULL turns this option off)

epsilon

excludes vectors from the search space of nearest neighbors if their *distance* is farther away than epsilon (NULL turns this option off)

silent

prevents warning messages from being printed to the R console

Value

A data.frame with forecast statistics for the different parameter settings: ll{ L library length (number of vectors) num_pred number of predictions rho correlation coefficient between observations and predictions mae mean absolute error rmse root mean square error }

Details

The default parameters are set so that passing a matrix as the only argument will use E = 1 (embedding dimension), and leave-one-out cross-validation over the whole time series to compute cross-mapping from the first column to the second column, letting the library size vary from 10 to 100 in increments of 10.

norm_type "L2 norm" (default) uses the typical Euclidean distance: $$distance(a,b) := \sqrt{\sum_i{(a_i - b_i)^2}}$$ norm_type "L1 norm" uses the Manhattan distance: $$distance(a,b) := \sum_i{|a_i - b_i|}$$ norm type "P norm" uses the LP norm, generalizing the L1 and L2 norm to use $p$ as the exponent: $$distance(a,b) := \sum_i{(a_i - b_i)^p}^{1/p}$$

Examples

Run this code

data("sardine_anchovy_sst")
anchovy_xmap_sst <- ccm(sardine_anchovy_sst, E = 3,
  lib_column = "anchovy", target_column = "np_sst",
  lib_sizes = seq(10, 80, by = 10), num_samples = 100)

Run the code above in your browser using DataLab