agrep
Approximate String Matching (Fuzzy Matching)
Searches for approximate matches to pattern
(the first argument)
within each element of the string x
(the second argument) using
the generalized Levenshtein edit distance (the minimal possibly
weighted number of insertions, deletions and substitutions needed to
transform one string into another).
 Keywords
 character
Usage
agrep(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, value = FALSE, fixed = TRUE, useBytes = FALSE)
agrepl(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, fixed = TRUE, useBytes = FALSE)
Arguments
 pattern
 a nonempty character string or a character string
containing a regular expression (for
fixed = FALSE
) to be matched. Coerced byas.character
to a string if possible.  x
 character vector where matches are sought.
Coerced by
as.character
to a character vector if possible.  max.distance
 Maximum distance allowed for a match. Expressed
either as integer, or as a fraction of the pattern length
times the maximal transformation cost (will be replaced by the
smallest integer not less than the corresponding fraction), or a
list with possible components
cost
: maximum number/fraction of match cost (generalized Levenshtein distance)
all
: maximal number/fraction of all transformations (insertions, deletions and substitutions)
insertions
: maximum number/fraction of insertions
deletions
: maximum number/fraction of deletions
substitutions
: maximum number/fraction of substitutions
If
cost
is not given,all
defaults to 10%, and the other transformation number bounds default toall
. The component names can be abbreviated.  costs
 a numeric vector or list with names partially matching
insertions, deletions and substitutions giving
the respective costs for computing the generalized Levenshtein
distance, or
NULL
(default) indicating using unit cost for all three possible transformations. Coerced to integer viaas.integer
if possible.  ignore.case
 if
FALSE
, the pattern matching is case sensitive and ifTRUE
, case is ignored during matching.  value
 if
FALSE
, a vector containing the (integer) indices of the matches determined is returned and ifTRUE
, a vector containing the matching elements themselves is returned.  fixed
 logical. If
TRUE
(default), the pattern is matched literally (as is). Otherwise, it is matched as a regular expression.  useBytes
 logical. in a multibyte locale, should the comparison be characterbycharacter (the default) or bytebybyte.
Details
The Levenshtein edit distance is used as measure of approximateness: it is the (possibly costweighted) total number of insertions, deletions and substitutions required to transform one string into another.
This uses tre
by Ville Laurikari
(http://laurikari.net/tre/), which supports MBCS
character matching.
The main effect of useBytes
is to avoid errors/warnings about
invalid inputs and spurious matches in multibyte locales.
It inhibits the conversion of inputs with marked encodings, and is
forced if any input is found which is marked as "bytes"
(see
Encoding
).
Value

agrep returns a vector giving the indices of the elements that
yielded a match, or, if value is TRUE, the matched
elements (after coercion, preserving names but no other attributes).agrepl returns a logical vector.
Note
Since someone who read the description carelessly even filed a bug
report on it, do note that this matches substrings of each element of
x
(just as grep
does) and not whole
elements. See also adist
in package utils, which
optionally returns the offsets of the matched substrings.
See Also
Examples
library(base)
agrep("lasy", "1 lazy 2")
agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)