dtw: Dynamic Time Warping

Description

Calculate the DTW distance, cost matrices and direction matrices including the warping path two multivariate time series.

Usage

dtw(Q, C, dist_method = c("norm1", "norm2", "norm2_square"), 
    step_pattern = c("symmetric2", "symmetric1"), ws = NULL,
    return_cm = FALSE,
    return_diffM = FALSE,
    return_wp = FALSE,
    return_diffp = FALSE,
    return_QC = FALSE)
                     
cm(Q, C, dist_method = c("norm1", "norm2", "norm2_square"), ws = NULL)

Arguments

Query time series. Q needs to be one of the following: (1) a one dimensional vector, (2) a matrix where each row is one observations and each column is one dimension of the time series, or (3) a matrix of differences/ costs (diffM, cm). If Q and C are matrices they need to have the same number of columns.

Candidate time series. C needs to be one of the following: (1) a one dimensional vector, (2) a matrix where each row is one observations and each column is one dimension of the time series, or (3) if Q is a matrix of differences or costs C needs to be the respective character string 'diffM' or 'cm'.

dist_method

character, describes the method of distance measure for multivariate time series (this parameter is ignored for univariate time series). Currently supported methods are 'norm1' (default, is the Manhattan distance), 'norm2' (is the Euclidean distance) and 'norm2_square' For the function cm the parameter dist_method can also be a user defined distance function.

step_pattern

character, describes the step pattern. Currently implemented are the patterns symmetric1 and symmetric2, see details.

integer, describes the window size for the sakoe chiba window. If NULL, then no window is applied. (default = NULL)

return_cm

boolean, if TRUE then the Matrix of costs (the absolute value) is returned. (default = FALSE)

return_diffM

boolean, if TRUE then the Matrix of differences (not the absolute value) is returned. (default = FALSE)

return_wp

boolean, if TRUE then the warping path is returned. (default = FALSE) If return_diffp == TRUE, then return_wp is set to TRUE as well.

return_diffp

boolean, if TRUE then the path of differences (not the absolute value) is returned. (default = FALSE)

return_QC

boolean, if TRUE then the input vectors Q and C are appended to the returned list. This is useful for the plot.idtw function. (default = FALSE)

Value

distance

the DTW distance, that is the element of the last row and last column of gcm

normalized_distance

the normalized DTW distance, that is the distance divided by N+M, where N and M are the lengths of the time series Q and C, respectively. If step_pattern == 'symmetric1' no normalization is performed and NA is returned (see details).

gcm

global cost matrix

direction matrix (3=up, 1=diagonal, 2=left)

warping path

indices of Q of the optimal path

indices of C of the optimal path

Matrix of costs

diffM

Matrix of differences

diffp

path of differences

input Q

input C

Details

The dynamic time warping distance is the element in the last row and last column of the global cost matrix.

For the multivariate case where Q is a matrix of n rows and k columns and C is a matrix of m rows and k columns the dist_method parameter defines the following distance measures:

norm1: $$dist(Q_{i,.}, C_{j,.}) = \sum_{l = 1}^k |Q_{i,l} - C_{j,l}|$$

norm2: $$dist(Q_{i,.}, C_{j,.}) = \sqrt{\sum_{l = 1}^k (Q_{i,l} - C_{j,l})^2}$$

norm2_square: $$dist(Q_{i,.}, C_{j,.}) = \sum_{l = 1}^k (Q_{i,l} - C_{j,l})^2$$

The parameter step_pattern describes how the two time series are aligned. If step_pattern == "symmetric1" then $$gcm_{i,j} = cm{i,j} + min(gcm_{i-1,j}, gcm{i-1, j-1}, gcm{i, i-1} $$.

If step_pattern == "symmetric2" then $$gcm_{i,j} = cm{i,j} + min(gcm_{i-1,j}, cm{i,j}+ gcm{i-1, j-1}, gcm{i, i-1} $$.

To make DTW distances comparable for many time series of different lengths use the normlized_distance with the setting step_method = 'symmetric2'. Please find a more detailed discussion and further references here: http://dtw.r-forge.r-project.org/.

References

Sakoe, H.; Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on , vol.26, no.1, pp. 43-49, Feb 1978. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1163055

Examples

Run this code

# NOT RUN {
# multivariate
Q <- matrix(rnorm(100), ncol=2)
C <- matrix(rnorm(80), ncol=2)
tmp <- dtw(Q = Q, C = C, ws = 30)


# univariate
Q <- cumsum(rnorm(100))
C <- Q[11:100] + rnorm(90, 0, 0.5)
tmp <- dtw(Q = Q, C = C, ws = 15, return_diffM = FALSE)
names(tmp)
tmp$distance



# compare different input variations
dtw_base <- dtw(Q = Q, C = C, ws = 15, return_diffM = TRUE)
dtw_diffM <- dtw(Q = dtw_base$diffM, C = "diffM", ws = 15, return_diffM = TRUE)
dtw_cm <- dtw(Q = abs(dtw_base$diffM), C = "cm", ws = 15, return_diffM = TRUE)

identical(dtw_base$gcm, dtw_cm$gcm)
identical(dtw_base$gcm, dtw_diffM$gcm)

# of course no diffM is returned in the 'cm'-case
dtw_cm$diffM

# multivariate case
Q <- matrix(rnorm(100), ncol=2)
C <- matrix(rnorm(80), ncol=2)
dtw(Q = Q, C = C, ws = 30, dist_method = "norm2")
# }

Run the code above in your browser using DataLab