Learn R Programming

dcortools (version 0.1.7)

distcov: Calculates the distance covariance szekely2007,szekely2009browniandcortools.

Description

Calculates the distance covariance szekely2007,szekely2009browniandcortools.

Usage

distcov(
  X,
  Y,
  affine = FALSE,
  standardize = FALSE,
  bias.corr = FALSE,
  type.X = "sample",
  type.Y = "sample",
  metr.X = "euclidean",
  metr.Y = "euclidean",
  use = "all",
  algorithm = "auto"
)

Value

numeric; the distance covariance between samples X and Y.

Arguments

X

contains either the first sample or its corresponding distance matrix.

In the first case, X can be provided either as a vector (if one-dimensional), a matrix or a data.frame (if two-dimensional or higher).

In the second case, the input must be a distance matrix corresponding to the sample of interest.

If X is a sample, type.X must be specified as "sample". If X is a distance matrix, type.X must be specified as "distance".

Y

see X.

affine

logical; specifies if the affinely invariant distance covariance dueck2014affinelydcortools should be calculated or not.

standardize

logical; specifies if X and Y should be standardized dividing each component by its standard deviations. No effect when affine = TRUE.

bias.corr

logical; specifies if the bias corrected version of the sample distance covariance huo2016fastdcortools should be calculated.

type.X

For "distance", X is interpreted as a distance matrix. For "sample", X is interpreted as a sample.

type.Y

see type.X.

metr.X

specifies the metric which should be used to compute the distance matrix for X (ignored when type.X = "distance").

Options are "euclidean", "discrete", "alpha", "minkowski", "gaussian", "gaussauto", "boundsq" or user-specified metrics (see examples).

For "alpha", "minkowski", "gaussian", "gaussauto" and "boundsq", the corresponding parameters are specified via "c(metric, parameter)", e.g. c("gaussian", 3) for a Gaussian metric with bandwidth parameter 3; the default parameter is 2 for "minkowski" and "1" for all other metrics.

See lyons2013distance,sejdinovic2013equivalence,bottcher2017detecting;textualdcortools for details.

metr.Y

see metr.X.

use

specifies how to treat missing values. "complete.obs" excludes observations containing NAs, "all" uses all observations.

algorithm

specifies the algorithm used for calculating the distance covariance.

"fast" uses an O(n log n) algorithm if the observations are one-dimensional and metr.X and metr.Y are either "euclidean" or "discrete", see also huo2016fast;textualdcortools.

"memsave" uses a memory saving version of the standard algorithm with computational complexity O(n^2) but requiring only O(n) memory.

"standard" uses the classical algorithm. User-specified metrics always use the classical algorithm.

"auto" chooses the best algorithm for the specific setting using a rule of thumb.

References

bottcher2017detectingdcortools

dueck2014affinelydcortools

huo2016fastdcortools

lyons2013distancedcortools

sejdinovic2013equivalencedcortools

szekely2007dcortools

szekely2009browniandcortools

Examples

Run this code
X <- rnorm(100)
Y <- X + 3 * rnorm(100)
distcov(X, Y) # standard distance covariance

distcov(X, Y, metr.X = "gaussauto", metr.Y = "gaussauto") 
# Gaussian distance with bandwidth choice based on median heuristic

distcov(X, Y, metr.X = c("alpha", 0.5), metr.Y = c("alpha", 0.5)) 
# alpha distance covariance with alpha = 0.5.


#Define a user-specified (slow) version of the alpha metric

alpha_user <- function(X, prm = 1, kernel = FALSE) {
    as.matrix(dist(X)) ^ prm
}

distcov(X, Y, metr.X = c("alpha", 0.5), metr.Y = c("alpha", 0.5)) 
# Gives the same result as before.
   

#User-specified Gaussian kernel function  
     
gauss_kernel <- function(X, prm = 1, kernel = TRUE)  {
    exp(as.matrix(dist(X)) ^ 2 / 2 / prm ^ 2)
}  

distcov(X, Y, metr.X = c("gauss_kernel", 2), metr.Y = c("gauss_kernel", 2)) 
# calculates the distance covariance using the corresponding kernel-induced metric

distcov(X, Y, metr.X = c("gaussian", 2), metr.Y = c("gaussian", 2)) 
# same result

Y <- matrix(nrow = 100, ncol = 2)
X <- rnorm(300)
dim(X) <- c(100, 3)
Z <- rnorm(100)
Y <- matrix(nrow = 100, ncol = 2)
Y[, 1] <- X[, 1] + Z
Y[, 2] <- 3 * Z

distcov(X, Y) 

distcov(X, Y, affine = TRUE) 
# affinely invariant distance covariance

distcov(X, Y, standardize = TRUE) 
## distance covariance standardizing the components of X and Y

Run the code above in your browser using DataLab