diss.PRED: Dissimilarity Measure Based on Nonparametric Forecast

Description

Computes the dissimilarity between two time series as the L1 distance between the kernel estimators of their forecast densities at a pre-specified horizon.

Usage

diss.PRED(x, y, h = 5, B=500, logarithms=c(FALSE, FALSE), differences=c(0,0), plot=FALSE)
multidiss.PRED( series, h=5, B=500, logarithms=NULL, differences=NULL, plot=FALSE)

Arguments

Numeric vector containing the first of the two time series.

Numeric vector containing the second of the two time series.

The horizon of interest, i.e the number of steps-ahead where the prediction is evaluated.

The amount of bootstrap resamples.

logarithms

Boolean vector. Specifies whether to transform each series by taking logarithms or not.

differences

Numeric vector. Specifies the amount of differences to apply to each series.

plot

If TRUE, plot the resulting forecast densities.

series

numeric matrix. Each row specifies one time series.

Value

diss.PREDreturns a list with the following components.
L1distThe computed distance.
dens.xA 2-column matrix with the density of predicion of series x. First column is the base (x) and the second column is the value (y) of the density.
dens.yA 2-column matrix with the density of predicion of series y. First column is the base (x) and the second column is the value (y) of the density.
multidiss.PRED returns a list with the following components.
distA dist object with the pairwise L1 distances between series.
densitiesA list of 2-column matrices containing the densities of each series, in the same format as 'dens.x' or 'dens.y' of diss.PRED.

Details

The dissimilarity between the time series x and y is given by $$d(x,y) = \int{ | f_{x,h}(u) - f_{y,h}(u) | du}$$ where $f_{x,h}$ and $f_{y,h}$ are kernel density estimators of the forecast densities h-steps ahead of x and y, respectively. The horizon of interest h is pre-specified by the user. The kernel density estimators are based on B bootstrap replicates obtained by using a resampling procedure that mimics the generating processes, which are assumed to follow an arbitrary autoregressive structure (parametric or non-parametric). The procedure is completely detailed in Vilar et al. (2010). This function has high computational cost due to the bootstrapping procedure. multidiss.PRED computes the similarity matrix for more than two series requiring different a different logarithm transform or different amount of differences. In this case the default values of logarithms are FALSE and 0 for differences.

References

Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal., 51,762--776. Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput. Statist. Data Anal., 54 (11), 2850--2865.

Examples

Run this code

x <- (rnorm(100))
x <- x + abs(min(x)) + 1 #shift to produce values greater than 0, for a correct logarithm transform
y <- (rnorm(100))
z <- sin(seq(0, pi, length.out=100))
## Compute the distance and check for coherent results
diss.PRED(x, y, 5, logarithms=c(FALSE,FALSE), differences=c(1,0))
#create a dist object for its use with clustering functions like pam or hclust
multidiss.PRED( rbind(x,y,z), h=5, B=500, logarithms=c(TRUE,FALSE, FALSE), differences=c(1,1,2) )

Run the code above in your browser using DataLab