ccm: Convergent cross mapping using simplex projection

Description

ccm uses time delay embedding on one time series to generate an attractor reconstruction, and then applies the simplex projection algorithm to estimate concurrent values of another time series. This method is typically applied, varying the library sizes, to determine if one time series contains the necessary dynamic information to recover the influence of another, causal variable.

Usage

ccm(block, lib = NULL, pred = NULL, norm = 2, E = 1, tau = -1, 
    tp = 0, num_neighbors = "e+1", lib_sizes = c(10, 75, 5), 
    random_libs = TRUE, num_samples = 100, replace = FALSE, lib_column = 1, 
    target_column = 2, first_column_time = FALSE, RNGseed = NULL, 
    exclusion_radius = NULL, epsilon = NULL, stats_only = TRUE, 
    silent = TRUE)

Arguments

block

either a vector to be used as the time series, or a data.frame or matrix where each column is a time series

lib

a 2-column matrix, data.frame, 2-element vector or string of row indice pairs, where each pair specifies the first and last *rows* of the time series to create the library. If not specified, all available rows are used

pred

(same format as lib), but specifying the sections of the time series to forecast. If not specified, set equal to lib

norm

the distance measure to use. see 'Details'

the embedding dimensions to use for time delay embedding

tau

the time-delay offset to use for time delay embedding

the prediction horizon (how far ahead to forecast)

num_neighbors

the number of nearest neighbors to use. Note that the default value will change depending on the method selected. (any of "e+1", "E+1", "e + 1", "E + 1" will set this parameter to E+1 for each run

lib_sizes

three integers specifying the start, stop and increment index of library sizes

random_libs

indicates whether to use randomly sampled libs

num_samples

is the number of random samples at each lib size (this parameter is ignored if random_libs is FALSE)

replace

indicates whether to sample vectors with replacement

lib_column

name (index) of the column to cross map from

target_column

name (index) of the column to forecast

first_column_time

indicates whether the first column of the given block is a time column

RNGseed

will set a seed for the random number generator, enabling reproducible runs of ccm with randomly generated libraries

exclusion_radius

excludes vectors from the search space of nearest neighbors if their *time index* is within exclusion_radius (NULL turns this option off)

epsilon

not implemented

stats_only

specify whether to output just the forecast statistics or the raw predictions for each run

silent

prevents warning messages from being printed to the R console

Value

If stats_only = TRUE: a data.frame with forecast statistics for both the forward and reverse mappings:

LibSize	library length (number of vectors)
x:y	cross mapped correlation coefficient between observations x and predictions y
y:x	cross mapped correlation coefficient between observations y and predictions x
E	embedding dimension
tau	time delay offset
tp	forecast interval
nn	number nearest neighbors

If stats_only = FALSE: a named list with the following items: settings:

LibMeans	data.frame with the mean bidirectional forecast statistics
CCM1_PredictStat	data.frame with forward mapped prediction statistics for each prediction of the ensemble
CCM1_Predictions	list of prediction result data.frame each forward mapped prediction of the ensemble
CCM2_PredictStat	data.frame with reverse mapped prediction statistics for each prediction of the ensemble
CCM2_Predictions	list of prediction result data.frame each reverse mapped prediction of the ensemble

CCM1_PredictStat and CCM2_PredictStat data.frames have columns:

N	prediction number
E	embedding dimension
nn	number of nearest neighbors
tau	embedding time delay offset
LibSize	library size
rho	correlation coefficient
RMSE	root mean square error
MAE	maximum absolute error
lib	column name of the library vector
target	column name of the target vector

Details

ccm runs both forward and reverse cross maps in seperate threads. Results are returned for both mappings. The default parameters are set so that passing a matrix as the only argument will use E = 1 (embedding dimension), and leave-one-out cross-validation over the whole time series to compute cross-mapping from the first column to the second column, letting the library size vary from 10 to 75 in increments of 5.

norm = 2 (only option currently available) uses the "L2 norm", Euclidean distance: $$distance(a,b) := \sqrt{\sum_i{(a_i - b_i)^2}} $$

Examples

Run this code

# NOT RUN {
anchovy_xmap_sst <- ccm(sardine_anchovy_sst, E = 3, 
  lib_column = "anchovy", target_column = "np_sst", 
  lib_sizes = c(10, 75, 5), num_samples = 100)
# }

Run the code above in your browser using DataLab