# dist

##### Distance Matrix Computation

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.

- Keywords
- multivariate, cluster

##### Usage

`dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)`as.dist(m, diag = FALSE, upper = FALSE)
## S3 method for class 'default':
as.dist(m, diag = FALSE, upper = FALSE)

## S3 method for class 'dist':
print(x, diag = NULL, upper = NULL,
digits = getOption("digits"), justify = "none",
right = TRUE, ...)

## S3 method for class 'dist':
as.matrix(x, \dots)

##### Arguments

- x
- a numeric matrix, data frame or
`"dist"`

object. - method
- the distance measure to be used. This must be one of
`"euclidean"`

,`"maximum"`

,`"manhattan"`

,`"canberra"`

,`"binary"`

or`"minkowski"`

. Any unambiguous substring can be given. - diag
- logical value indicating whether the diagonal of the
distance matrix should be printed by
`print.dist`

. - upper
- logical value indicating whether the upper triangle of the
distance matrix should be printed by
`print.dist`

. - p
- The power of the Minkowski distance.
- m
- An object with distance information to be converted to a
`"dist"`

object. For the default method, a`"dist"`

object, or a matrix (of distances) or an object which can be coerced to such a matrix using`as.matrix()`

. (Only the lower triangle of the matrix is used, the rest is ignored). - digits, justify
- passed to
`format`

inside of`print()`

. - right, ...
- further arguments, passed to other methods.

##### Details

Available distance measures are (written for two vectors $x$ and $y$): [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Missing values are allowed, and are excluded from all computations
involving the rows within which they occur.
Further, when `Inf`

values are involved, all pairs of values are
excluded when their contribution to the distance gave `NaN`

or
`NA`

.
If some columns are excluded in calculating a Euclidean, Manhattan,
Canberra or Minkowski distance, the sum is scaled up proportionally to
the number of columns used. If all pairs are excluded when
calculating a particular distance, the value is `NA`

.

The `"dist"`

method of `as.matrix()`

and `as.dist()`

can be used for conversion between objects of class `"dist"`

and conventional distance matrices.

`as.dist()`

is a generic function. Its default method handles
objects inheriting from class `"dist"`

, or coercible to matrices
using `as.matrix()`

. Support for classes representing
distances (also known as dissimilarities) can be added by providing an
`as.matrix()`

or, more directly, an `as.dist`

method
for such a class.

##### Value

`dist`

returns an object of class`"dist"`

.The lower triangle of the distance matrix stored by columns in a vector, say

`do`

. If`n`

is the number of observations, i.e.,`n <- attr(do, "Size")`

, then for $i < j \le n$, the dissimilarity between (row) i and j is`do[n*(i-1) - i*(i-1)/2 + j-i]`

. The length of the vector is $n*(n-1)/2$, i.e., of order $n^2$.The object has the following attributes (besides

`"class"`

equal to`"dist"`

):Size integer, the number of observations in the dataset. Labels optionally, contains the labels, if any, of the observations of the dataset. Diag, Upper logicals corresponding to the arguments `diag`

and`upper`

above, specifying how the object should be printed.call optionally, the `call`

used to create the object.method optionally, the distance method used; resulting from `dist()`

, the (`match.arg()`

ed)`method`

argument.

##### concept

dissimilarity

##### References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979)
*Multivariate Analysis.* Academic Press.

Borg, I. and Groenen, P. (1997)
*Modern Multidimensional Scaling. Theory and Applications.*
Springer.

##### See Also

`daisy`

in the *mixed* (continuous / categorical)
variables.
`hclust`

.

##### Examples

`library(stats)`

```
require(graphics)
x <- matrix(rnorm(100), nrow = 5)
dist(x)
dist(x, diag = TRUE)
dist(x, upper = TRUE)
m <- as.matrix(dist(x))
d <- as.dist(m)
stopifnot(d == dist(x))
## Use correlations between variables "as distance"
dd <- as.dist((1 - cor(USJudgeRatings))/2)
round(1000 * dd) # (prints more nicely)
plot(hclust(dd)) # to see a dendrogram of clustered variables
## example of binary and canberra distances.
x <- c(0, 0, 1, 1, 1, 1)
y <- c(1, 0, 1, 1, 0, 1)
dist(rbind(x, y), method = "binary")
## answer 0.4 = 2/5
dist(rbind(x, y), method = "canberra")
## answer 2 * (6/5)
## To find the names
labels(eurodist)
## Examples involving "Inf" :
## 1)
x[6] <- Inf
(m2 <- rbind(x, y))
dist(m2, method = "binary") # warning, answer 0.5 = 2/4
## These all give "Inf":
stopifnot(Inf == dist(m2, method = "euclidean"),
Inf == dist(m2, method = "maximum"),
Inf == dist(m2, method = "manhattan"))
## "Inf" is same as very large number:
x1 <- x; x1[6] <- 1e100
stopifnot(dist(cbind(x, y), method = "canberra") ==
print(dist(cbind(x1, y), method = "canberra")))
## 2)
y[6] <- Inf #-> 6-th pair is excluded
dist(rbind(x, y), method = "binary" ) # warning; 0.5
dist(rbind(x, y), method = "canberra" ) # 3
dist(rbind(x, y), method = "maximum") # 1
dist(rbind(x, y), method = "manhattan") # 2.4
```

*Documentation reproduced from package stats, version 3.3, License: Part of R 3.3*