# adist

##### Approximate String Distances

Compute the approximate string distance between character vectors. The distance is a generalized Levenshtein (edit) distance, giving the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another.

- Keywords
- character

##### Usage

```
adist(x, y = NULL, costs = NULL, counts = FALSE, fixed = TRUE,
partial = !fixed, ignore.case = FALSE, useBytes = FALSE)
```

##### Arguments

- x
- a character vector. Long vectors are not supported.
- y
- a character vector, or
`NULL`

(default) indicating taking`x`

as`y`

. - costs
- a numeric vector or list with names partially matching
`insertions`,`deletions`and`substitutions`giving the respective costs for computing the Levenshtein distance, or`NULL`

(default) indicating using unit cost for all three possible transformations. - counts
- a logical indicating whether to optionally return the
transformation counts (numbers of insertions, deletions and
substitutions) as the
`"counts"`

attribute of the return value. - fixed
- a logical. If
`TRUE`

(default), the`x`

elements are used as string literals. Otherwise, they are taken as regular expressions and`partial = TRUE`

is implied (corresponding to the approximate string distance used by`agrep`

with`fixed = FALSE`

. - partial
- a logical indicating whether the transformed
`x`

elements must exactly match the complete`y`

elements, or only substrings of these. The latter corresponds to the approximate string distance used by`agrep`

(by default). - ignore.case
- a logical. If
`TRUE`

, case is ignored for computing the distances. - useBytes
- a logical. If
`TRUE`

distance computations are done byte-by-byte rather than character-by-character.

##### Details

The (generalized) Levenshtein (or edit) distance between two strings
`s` and `t` is the minimal possibly weighted number of
insertions, deletions and substitutions needed to transform `s`
into `t` (so that the transformation exactly matches `t`).
This distance is computed for `partial = FALSE`

, currently using
a dynamic programming algorithm (see, e.g.,
`s` and `t`, respectively. Additionally computing
the transformation sequence and counts is $O(\max(m, n))$.

The generalized Levenshtein distance can also be used for approximate
(fuzzy) string matching, in which case one finds the substring of
`t` with minimal distance to the pattern `s` (which could be
taken as a regular expression, in which case the principle of using
the leftmost and longest match applies), see, e.g.,
`partial = TRUE`

using `tre` by
Ville Laurikari (`agrep`

. In this
case, the given cost values are coerced to integer.

Note that the costs for insertions and deletions can be different, in
which case the distance between `s` and `t` can be different
from the distance between `t` and `s`.

##### Value

- A matrix with the approximate string distances of the elements of
`x`

and`y`

, with rows and columns corresponding to`x`

and`y`

, respectively.If

`counts`

is`TRUE`

, the transformation counts are returned as the`"counts"`

attribute of this matrix, as a 3-dimensional array with dimensions corresponding to the elements of`x`

, the elements of`y`

, and the type of transformation (insertions, deletions and substitutions), respectively. Additionally, if`partial = FALSE`

, the transformation sequences are returned as the`"trafos"`

attribute of the return value, as character strings with elements`M`,`I`,`D`and`S`indicating a match, insertion, deletion and substitution, respectively. If`partial = TRUE`

, the offsets (positions of the first and last element) of the matched substrings are returned as the`"offsets"`

attribute of the return value (with both offsets $-1$ in case of no match).

##### See Also

`agrep`

for approximate string matching (fuzzy matching)
using the generalized Levenshtein distance.

##### Examples

`library(utils)`

```
## Cf. https://en.wikipedia.org/wiki/Levenshtein_distance
adist("kitten", "sitting")
## To see the transformation counts for the Levenshtein distance:
drop(attr(adist("kitten", "sitting", counts = TRUE), "counts"))
## To see the transformation sequences:
attr(adist(c("kitten", "sitting"), counts = TRUE), "trafos")
## Cf. the examples for agrep:
adist("lasy", "1 lazy 2")
## For a "partial approximate match" (as used for agrep):
adist("lasy", "1 lazy 2", partial = TRUE)
```

*Documentation reproduced from package utils, version 3.3, License: Part of R 3.3*