Compute the approximate string distance between character vectors. The distance is a generalized Levenshtein (edit) distance, giving the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another.
adist(x, y = NULL, costs = NULL, counts = FALSE, fixed = TRUE,
      partial = !fixed, ignore.case = FALSE, useBytes = FALSE)a character vector. Long vectors are not supported.
a character vector, or NULL (default) indicating
    taking x as y.
a numeric vector or list with names partially matching
    insertions, deletions and substitutions giving
    the respective costs for computing the Levenshtein distance, or
    NULL (default) indicating using unit cost for all three
    possible transformations.
a logical indicating whether to optionally return the
    transformation counts (numbers of insertions, deletions and
    substitutions) as the "counts" attribute of the return
    value.
a logical.  If TRUE (default), the x
    elements are used as string literals.  Otherwise, they are taken as
    regular expressions and partial = TRUE is implied
    (corresponding to the approximate string distance used by
    agrep with fixed = FALSE).
a logical indicating whether the transformed x
    elements must exactly match the complete y elements, or only
    substrings of these.  The latter corresponds to the approximate
    string distance used by agrep (by default).
a logical.  If TRUE, case is ignored for
    computing the distances.
a logical.  If TRUE distance computations are
    done byte-by-byte rather than character-by-character.
A matrix with the approximate string distances of the elements of
  x and y, with rows and columns corresponding to x
  and y, respectively.
If counts is TRUE, the transformation counts are
  returned as the "counts" attribute of this matrix, as a
  3-dimensional array with dimensions corresponding to the elements of
  x, the elements of y, and the type of transformation
  (insertions, deletions and substitutions), respectively.
  Additionally, if partial = FALSE, the transformation sequences
  are returned as the "trafos" attribute of the return value, as
  character strings with elements M, I, D and
  S indicating a match, insertion, deletion and substitution,
  respectively.  If partial = TRUE, the offsets (positions of
  the first and last element) of the matched substrings are returned as
  the "offsets" attribute of the return value (with both offsets
  \(-1\) in case of no match).
The (generalized) Levenshtein (or edit) distance between two strings
  s and t is the minimal possibly weighted number of
  insertions, deletions and substitutions needed to transform s
  into t (so that the transformation exactly matches t).
  This distance is computed for partial = FALSE, currently using
  a dynamic programming algorithm (see, e.g.,
  https://en.wikipedia.org/wiki/Levenshtein_distance) with space
  and time complexity \(O(mn)\), where \(m\) and \(n\) are the
  lengths of s and t, respectively.  Additionally computing
  the transformation sequence and counts is \(O(\max(m, n))\).
The generalized Levenshtein distance can also be used for approximate
  (fuzzy) string matching, in which case one finds the substring of
  t with minimal distance to the pattern s (which could be
  taken as a regular expression, in which case the principle of using
  the leftmost and longest match applies), see, e.g.,
  https://en.wikipedia.org/wiki/Approximate_string_matching.  This
  distance is computed for partial = TRUE using tre by
  Ville Laurikari (http://laurikari.net/tre/) and
  corresponds to the distance used by agrep.  In this
  case, the given cost values are coerced to integer.
Note that the costs for insertions and deletions can be different, in which case the distance between s and t can be different from the distance between t and s.
agrep for approximate string matching (fuzzy matching)
  using the generalized Levenshtein distance.
# NOT RUN {
## Cf. https://en.wikipedia.org/wiki/Levenshtein_distance
adist("kitten", "sitting")
## To see the transformation counts for the Levenshtein distance:
drop(attr(adist("kitten", "sitting", counts = TRUE), "counts"))
## To see the transformation sequences:
attr(adist(c("kitten", "sitting"), counts = TRUE), "trafos")
## Cf. the examples for agrep:
adist("lasy", "1 lazy 2")
## For a "partial approximate match" (as used for agrep):
adist("lasy", "1 lazy 2", partial = TRUE)
# }
Run the code above in your browser using DataLab