computeDistMat: Compute a distance matrix for functional observations

Description

This mainly internal function offers a unified framework to access the dist function from the proxy package and additional (semi-)metrics.

Usage

computeDistMat(x, y = NULL, method = "Euclidean", dmin = 0, dmax = 1,
  dmin1 = 0, dmax1 = 1, dmin2 = 0, dmax2 = 1, t1 = 0, t2 = 1,
  .poi = seq(0, 1, length.out = ncol(x)), custom.metric = function(x, y, lp
  = 2, ...) {     return(sum(abs(x - y)^lp)^(1/lp)) }, a = NULL, b = NULL,
  c = NULL, lambda = 0, ...)

Arguments

[matrix] matrix containing the functional observations as rows.

[matrix] see x. The default NULL uses y = x.

method

[character(1)] character string describing the distance function to be used. For a full list execute metricChoices().

Euclidean: equals Lp with p = 2. This is the default.
Lp, Minkowski: the distance for an Lp-space, takes p as an additional argument in ....
Manhattan: equals Lp with p = 1.
supremum, max, maximum: equals Lp with p = Inf. The supremal pointwise difference between the curves.
and ...: all other available measures for dist.
shortEuclidean: Euclidean distance on a limited part of the domain. Additional arguments dmin and dmax can be specified in ..., giving the position of the first and the last point to use of an evenly spaced sequence from 0 to 1 of length length(grid). The default values are dmin = o and dmax = 1, which results in the Euclidean distance on the entire domain.
mean: the absolute similarity of the overall mean values of the observations.
relAreas: the difference of the relation of two areas on parts of the domain given by dmin1 to dmax1 and dmin2 to dmax2. They are defined analogously to dmin and dmax and take the same default values.
jump: the similarity of jump heights at points t1 and t2, i.e. x[t1 * length(x)] - x[t2 * length(x)] for every functional observation x. The points t1 and t2 are the positions in an evenly spaced sequence from 0 to 1 of length length(grid) for which to compare the jump height. The default values are t1 = 0 and t2 = 1.
globMax: the difference of the curves global maxima.
globMin: the difference of the curves global minima.
points: the mean absolute differences at certain observation points .poi, also called "points of impact". These are specified as a vector .poi of arbitrary length with values between 0 and 1, encoding the the index of the points of observations. The default value is .poi = seq(0, 1, length.out = length(grid)), which results in the Manhattan distance.
custom.metric: your own semimetric will be used. Specify your own distance function in the argument custom.metric.
amplitudeDistance,phaseDistance: The amplitude distance or phase distance as described in Srivastava, A. and E. P. Klassen (2016). Functional and Shape Data Analysis. Springer.
FisherRao, elasticMetric: the elastic distance of the square root velocity of the curves as described in Srivastava and Klassen (2016). This equates to the Fisher Rao metric.
elasticDistance: weighted mean of the amplitude and the phase distance using the implementation in elastic.distance. Additional arguments are the numeric the penalization parameters a,b,c for the amplitude distance (a^2) and the phase distance (b^2). The default values are a = 1/2, b = 1. Alternatively c denotes the ratio of 2*a and b. lambda is the additional penalization parameter for the warping allowed before calculating the elastic distance. The default is 1.
rucrdtw, rucred: Dynamic Time Warping Distance and Euclidean Distance from package rucrdtw. Implemented in Boersch-Supan (2016) and originally described in Rakthanmanon et al. (2012).

dmin, dmax, dmin1, dmax1, dmin2, dmax2

[integer(1)] encode the indices used to define subspaces for method %in% c("shortEuclidean", "relAreas") as numeric values between 0 and 1, where 0 encodes grid[1] and 1 encodes grid[length(grid)].

t1, t2

[numeric(1)] encode the position of the points for which to compare the jump heights in method = "jump" as numeric values between 0 and 1, see dmin.

.poi

[numeric(1 to ncol(x))] numeric vector of length arbitrary length taking numeric values between 0 and 1, denoting the position of the points of interest for method = "points". The default value is .poi = seq(0, 1, length.out = length(grid)), which results in the Manhattan distance.

custom.metric

[function(x, y, ...)] a function specifying how to compute the distance between two functional observations (= numeric vectors of the same length) x and y. It can handle additional arguments in .... The default is the Euclidean distance (equals Minkwoski distance with lp = 2). Used for method = "custom.metric".

a, b, c

[numeric(1)] weights of the amplitude distance (a) and the phase distance (b) in a semimetric that combines them by addition. Used for method == 'elasticDistance'.

lambda

[numeric(1)] penalization parameter for the warping allowed before calculating the elastic distance. Default value is 0. Large values imply less (no) warping, small values imply more warping. Used for method %in% c('elastic', 'SRV').

...

additional parameters to the (semi-)metrics.

Value

a matrix of dimensions nrow(x) by nrow(y) containing the distances of the functional observations contained in x and y, if y is specified. Otherwise a matrix containing the distances of all functional observations within x to each other.

References

Boersch-Supan (2016). rucrdtw: Fast time series subsequence search in R. The Journal of Open Source Software URL http://doi.org/10.21105/joss.00100

Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.

Rakthanmanon, Thanawin, et al. "Searching and mining trillions of time series subsequences under dynamic time warping." Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.

Srivastava, A. and E. P. Klassen (2016). Functional and Shape Data Analysis. Springer.