dist: Matrix Distance/Similarity Computation

Description

These functions compute and return the auto-distance/similarity matrix between either rows or columns of a matrix/data frame, or a list, as well as the cross-distance matrix between two matrices/data frames/lists.

Usage

dist(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
      by_rows = TRUE, auto_convert = TRUE)
simil(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
      by_rows = TRUE, auto_convert = TRUE)
pr_dist2simil(x)
pr_simil2dist(x)
as.dist(x, FUN = NULL)
as.simil(x, FUN = NULL)

Arguments

Value

Auto distances/similarities are returned as an object of class dist/simil and cross-distances/similarities as an object of class crossdist/crosssimil.

Details

The interface is fashioned after dist, but can also compute cross-distances, and allows user extensions by means of registry of all proximity measures (see pr_DB).

Missing values are allowed but are excluded from all computations involving the rows within which they occur. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used.

Distance measures can be used with simil, and similarity measures with dist. In these cases, the result is transformed accordingly using the specified conversion functions (default: $pr_simil2dist(d) = 1 - s$ and $pr_dist2simil(s) = 1 / (1 - d)$). Objects of class simil and dist can be converted one in another using as.dist and as.simil, respectively.

Distance and similarity objects can conveniently be subsetted (see examples).

References

Anderberg, M.R. (1973), Cluster analysis for applications, 359 pp., Academic Press, New York, NY, USA. Cox, M.F. and Cox, M.A.A. (2001), Multidimensional Scaling, Chapman and Hall. Sokol, R.S. and Sneath P.H.A (1963), Principles of Numerical Taxonomy, W. H. Freeman and Co., San Francisco.

Examples

Run this code

### show available proximities
summary(pr_DB)

### binary data
x <- matrix(sample(c(FALSE, TRUE), 8, rep = TRUE), ncol = 2)
dist(x, method = "Jaccard")

### for real-valued data
dist(x, method = "eJaccard")

### for positive real-valued data
dist(x, method = "fJaccard")

### cross distances
dist(x, x, method = "Jaccard")

### this is the same but less efficient
as.matrix(stats::dist(x, method = "binary"))

### numeric data
x <- matrix(rnorm(16), ncol = 4)

## test inheritance of names
rownames(x) <- LETTERS[1:4]
colnames(x) <- letters[1:4]
dist(x)
dist(x, x)

## custom distance function
f <- function(x, y) sum(x * y)
dist(x, f)

## working with lists
z <- unlist(apply(x, 1, list), recursive = FALSE)
(d <- dist(z))
dist(z, z)

## subsetting
d[[1:2]]
subset(d, c(1,3,4))

## row and column indexes
row.dist(d)
col.dist(d)

Run the code above in your browser using DataLab