These functions compute and return the auto-distance/similarity matrix between either rows or columns of a matrix/data frame, or a list, as well as the cross-distance matrix between two matrices/data frames/lists.

```
dist(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
pairwise = FALSE, by_rows = TRUE, convert_similarities = TRUE,
auto_convert_data_frames = TRUE)
simil(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
pairwise = FALSE, by_rows = TRUE, convert_distances = TRUE,
auto_convert_data_frames = TRUE)
```pr_dist2simil(x)
pr_simil2dist(x)

as.dist(x, FUN = NULL)
as.simil(x, FUN = NULL)

# S3 method for dist
as.matrix(x, diag = 0, …)
# S3 method for simil
as.matrix(x, diag = NA, …)

x

For `dist`

and `simil`

, a numeric matrix object, a data frame, or a list. A vector
will be converted into a column matrix. For `as.simil`

and
`as.dist`

, an object of class `dist`

and
`simil`

, respectively, or a numeric matrix. For
`pr_dist2simil`

and `pr_simil2dist`

, any numeric vector.

y

`NULL`

, or a similar object than `x`

method

a function, a registry entry, or a mnemonic string referencing the
proximity measure. A list of all available measures can be obtained
using `pr_DB`

(see examples). The default for `dist`

is
`"Euclidean"`

, and for `simil`

`"correlation"`

.

diag

logical value indicating whether the diagonal of the
distance/similarity matrix should be printed by
`print.dist`

/`print.simil`

. Note that the
diagonal values are never stored in `dist`

objects.

In the context of `as.matrix`

the value to use on the diagonal
representing self-proximities. In case of similarities, this
defaults to `NA`

since a priori there are no upper bounds, so
the maximum similarity needs to be specified by the user.

upper

logical value indicating whether the upper triangle of the
distance/similarity matrix should be printed by
`print.dist`

/`print.simil`

pairwise

logical value indicating whether distances should be
computed for the pairs of `x`

and `y`

only.

by_rows

logical indicating whether proximities between rows, or columns should be computed.

convert_similarities, convert_distances

logical indicating whether distances should be automatically converted into similarities (and the other way round) if needed.

auto_convert_data_frames

logical indicating whether data frames should be converted to matrices if all variables are numeric, or all are logical, or all are complex.

FUN

optional function to be used by `as.dist`

and
`as.simil`

. If `NULL`

, it is looked up in the method
registry. If there is none specified there, `FUN`

defaults to
`pr_simil2dist`

and `pr_dist2simil`

, respectively.

…

further arguments passed to the proximity function.

Auto distances/similarities are returned as an object of class `dist`

/`simil`

and
cross-distances/similarities as an object of class `crossdist`

/`crosssimil`

.

The interface is fashioned after `dist`

, but can
also compute cross-distances, and allows user extensions by means of
registry of all proximity measures (see `pr_DB`

).

Missing values are allowed but are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan, Canberra or
Minkowski distance, the sum is scaled up proportionally to the
number of columns used (compare `dist`

in
package stats).

Data frames are silently coerced to matrix if all columns are of
(same) mode `numeric`

or `logical`

.

Distance measures can be used with `simil`

, and similarity
measures with `dist`

. In these cases, the result is transformed
accordingly using the specified coercion functions (default:
\(pr\_simil2dist(x) = 1 - abs(x)\) and \(pr\_dist2simil(x) = 1 / (1 + x)\)).
Objects of class `simil`

and `dist`

can be converted one in
another using `as.dist`

and `as.simil`

, respectively.

Distance and similarity objects can conveniently be subset (see examples). Note that duplicate indexes are silently ignored.

Anderberg, M.R. (1973), *Cluster analysis for applications*,
359 pp., Academic Press, New York, NY, USA.

Cox, M.F. and Cox, M.A.A. (2001), *Multidimensional Scaling*,
Chapman and Hall.

Sokol, R.S. and Sneath P.H.A (1963), *Principles of Numerical
Taxonomy*, W. H. Freeman and Co., San Francisco.

`dist`

for compatibility information, and
`pr_DB`

for the proximity data base.

# NOT RUN { ### show available proximities summary(pr_DB) ### get more information about a particular one pr_DB$get_entry("Jaccard") ### binary data x <- matrix(sample(c(FALSE, TRUE), 8, rep = TRUE), ncol = 2) dist(x, method = "Jaccard") ### for real-valued data dist(x, method = "eJaccard") ### for positive real-valued data dist(x, method = "fJaccard") ### cross distances dist(x, x, method = "Jaccard") ### pairwise (diagonal) dist(x, x, method = "Jaccard", pairwise = TRUE) ### this is the same but less efficient as.matrix(stats::dist(x, method = "binary")) ### numeric data x <- matrix(rnorm(16), ncol = 4) ## test inheritance of names rownames(x) <- LETTERS[1:4] colnames(x) <- letters[1:4] dist(x) dist(x, x) ## custom distance function f <- function(x, y) sum(x * y) dist(x, f) ## working with lists z <- unlist(apply(x, 1, list), recursive = FALSE) (d <- dist(z)) dist(z, z) ## subsetting d[[1:2]] subset(d, c(1,3,4)) d[[c(1,2,2)]] # duplicate index gets ignored ## transformations and self-proximities as.matrix(as.simil(d, function(x) exp(-x)), diag = 1) ## row and column indexes row.dist(d) col.dist(d) # }