ecdfdistS: Distance Measures between Samples through Empirical Cumulative Distribution Functions

Description

We measure distance between two empirical cumulative distribution functions of the data. Unlike ecdfdist, this function takes raw data samples as input, and internally computes the empirical cumulative distribution functions (ECDF) for distance calculations.

Usage

ecdfdistS(
  veclist,
  method = c("KS", "Lp", "Wasserstein"),
  p = 1,
  as.dist = FALSE
)

Value

either dist object of an \((N\times N)\) symmetric matrix of pairwise distances by as.dist argument.

Arguments

veclist: a length \(N\) list of vectors.
method: name of the distance/dissimilarity measure. Case insensitive (default: ks).
p: exponent for Lp or Wasserstein distance (default: p=1).
as.dist: a logical; TRUE to return dist object, FALSE to return an \((N\times N)\) symmetric matrix of pairwise distances (default: FALSE).

Examples

Run this code

# \donttest{
## toy example : 10 of random and uniform distributions
mylist = list()
for (i in 1:10){
  mylist[[i]] = stats::rnorm(50, sd=2)
}
for (i in 11:20){
  mylist[[i]] = stats::runif(50, min=-5)
}

## compute three distances
d_KS = ecdfdistS(mylist, method="KS")
d_LP = ecdfdistS(mylist, method="Lp")
d_OT = ecdfdistS(mylist, method="Wasserstein")

## visualize
opar = par(no.readonly=TRUE)
par(mfrow=c(1,3), pty="s")
image(d_KS[,nrow(d_KS):1], axes=FALSE, main="Kolmogorov-Smirnov")
image(d_LP[,nrow(d_LP):1], axes=FALSE, main="Lp (p=1)")
image(d_OT[,nrow(d_OT):1], axes=FALSE, main="Wasserstein (p=1)")
par(opar)
# }

Run the code above in your browser using DataLab