# dist

##### Matrix Distance/Similarity Computation

These functions compute and return the auto-distance/similarity matrix between either rows or columns of a matrix/data frame, or a list, as well as the cross-distance matrix between two matrices/data frames/lists.

- Keywords
- cluster

##### Usage

```
dist(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
pairwise = FALSE, by_rows = TRUE, convert_similarities = TRUE,
auto_convert_data_frames = TRUE)
simil(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
pairwise = FALSE, by_rows = TRUE, convert_distances = TRUE,
auto_convert_data_frames = TRUE)
```pr_dist2simil(x)
pr_simil2dist(x)

as.dist(x, FUN = NULL)
as.simil(x, FUN = NULL)

# S3 method for dist
as.matrix(x, diag = 0, …)
# S3 method for simil
as.matrix(x, diag = NA, …)

##### Arguments

- x
For

`dist`

and`simil`

, a numeric matrix object, a data frame, or a list. A vector will be converted into a column matrix. For`as.simil`

and`as.dist`

, an object of class`dist`

and`simil`

, respectively, or a numeric matrix. For`pr_dist2simil`

and`pr_simil2dist`

, any numeric vector.- y
`NULL`

, or a similar object than`x`

- method
a function, a registry entry, or a mnemonic string referencing the proximity measure. A list of all available measures can be obtained using

`pr_DB`

(see examples). The default for`dist`

is`"Euclidean"`

, and for`simil`

`"correlation"`

.- diag
logical value indicating whether the diagonal of the distance/similarity matrix should be printed by

`print.dist`

/`print.simil`

. Note that the diagonal values are never stored in`dist`

objects.In the context of

`as.matrix`

the value to use on the diagonal representing self-proximities. In case of similarities, this defaults to`NA`

since a priori there are no upper bounds, so the maximum similarity needs to be specified by the user.- upper
logical value indicating whether the upper triangle of the distance/similarity matrix should be printed by

`print.dist`

/`print.simil`

- pairwise
logical value indicating whether distances should be computed for the pairs of

`x`

and`y`

only.- by_rows
logical indicating whether proximities between rows, or columns should be computed.

- convert_similarities, convert_distances
logical indicating whether distances should be automatically converted into similarities (and the other way round) if needed.

- auto_convert_data_frames
logical indicating whether data frames should be converted to matrices if all variables are numeric, or all are logical, or all are complex.

- FUN
optional function to be used by

`as.dist`

and`as.simil`

. If`NULL`

, it is looked up in the method registry. If there is none specified there,`FUN`

defaults to`pr_simil2dist`

and`pr_dist2simil`

, respectively.- …
further arguments passed to the proximity function.

##### Details

The interface is fashioned after `dist`

, but can
also compute cross-distances, and allows user extensions by means of
registry of all proximity measures (see `pr_DB`

).

Missing values are allowed but are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan, Canberra or
Minkowski distance, the sum is scaled up proportionally to the
number of columns used (compare `dist`

in
package stats).

Data frames are silently coerced to matrix if all columns are of
(same) mode `numeric`

or `logical`

.

Distance measures can be used with `simil`

, and similarity
measures with `dist`

. In these cases, the result is transformed
accordingly using the specified coercion functions (default:
\(pr\_simil2dist(x) = 1 - abs(x)\) and \(pr\_dist2simil(x) = 1 / (1 + x)\)).
Objects of class `simil`

and `dist`

can be converted one in
another using `as.dist`

and `as.simil`

, respectively.

Distance and similarity objects can conveniently be subset (see examples). Note that duplicate indexes are silently ignored.

##### Value

Auto distances/similarities are returned as an object of class `dist`

/`simil`

and
cross-distances/similarities as an object of class `crossdist`

/`crosssimil`

.

##### References

Anderberg, M.R. (1973), *Cluster analysis for applications*,
359 pp., Academic Press, New York, NY, USA.

Cox, M.F. and Cox, M.A.A. (2001), *Multidimensional Scaling*,
Chapman and Hall.

Sokol, R.S. and Sneath P.H.A (1963), *Principles of Numerical
Taxonomy*, W. H. Freeman and Co., San Francisco.

##### See Also

`dist`

for compatibility information, and
`pr_DB`

for the proximity data base.

##### Examples

`library(proxy)`

```
# NOT RUN {
### show available proximities
summary(pr_DB)
### get more information about a particular one
pr_DB$get_entry("Jaccard")
### binary data
x <- matrix(sample(c(FALSE, TRUE), 8, rep = TRUE), ncol = 2)
dist(x, method = "Jaccard")
### for real-valued data
dist(x, method = "eJaccard")
### for positive real-valued data
dist(x, method = "fJaccard")
### cross distances
dist(x, x, method = "Jaccard")
### pairwise (diagonal)
dist(x, x, method = "Jaccard",
pairwise = TRUE)
### this is the same but less efficient
as.matrix(stats::dist(x, method = "binary"))
### numeric data
x <- matrix(rnorm(16), ncol = 4)
## test inheritance of names
rownames(x) <- LETTERS[1:4]
colnames(x) <- letters[1:4]
dist(x)
dist(x, x)
## custom distance function
f <- function(x, y) sum(x * y)
dist(x, f)
## working with lists
z <- unlist(apply(x, 1, list), recursive = FALSE)
(d <- dist(z))
dist(z, z)
## subsetting
d[[1:2]]
subset(d, c(1,3,4))
d[[c(1,2,2)]] # duplicate index gets ignored
## transformations and self-proximities
as.matrix(as.simil(d, function(x) exp(-x)), diag = 1)
## row and column indexes
row.dist(d)
col.dist(d)
# }
```

*Documentation reproduced from package proxy, version 0.4-22, License: GPL-2*