Searches for approximate matches to
pattern (the first argument)
within each element of the string
x (the second argument) using
the generalized Levenshtein edit distance (the minimal possibly
weighted number of insertions, deletions and substitutions needed to
transform one string into another).
agrep(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, value = FALSE, fixed = TRUE, useBytes = FALSE)
agrepl(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, fixed = TRUE, useBytes = FALSE)
a non-empty character string or a character string
containing a regular expression (for
fixed = FALSE) to be
as.character to a string if possible.
character vector where matches are sought.
as.character to a character vector if
Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction), or a list with possible components
maximum number/fraction of match cost (generalized Levenshtein distance)
maximal number/fraction of all transformations (insertions, deletions and substitutions)
maximum number/fraction of insertions
maximum number/fraction of deletions
maximum number/fraction of substitutions
cost is not given,
all defaults to 10%, and the
other transformation number bounds default to
The component names can be abbreviated.
a numeric vector or list with names partially matching
insertions, deletions and substitutions giving
the respective costs for computing the generalized Levenshtein
NULL (default) indicating using unit cost for
all three possible transformations.
Coerced to integer via
as.integer if possible.
FALSE, the pattern matching is case
sensitive and if
TRUE, case is ignored during matching.
FALSE, a vector containing the (integer)
indices of the matches determined is returned and if
vector containing the matching elements themselves is returned.
TRUE (default), the pattern is
matched literally (as is). Otherwise, it is matched as a regular
logical. in a multibyte locale, should the comparison be character-by-character (the default) or byte-by-byte.
agrep returns a vector giving the indices of the elements that
yielded a match, or, if
TRUE, the matched
elements (after coercion, preserving names but no other attributes).
agrepl returns a logical vector.
The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.
tre by Ville Laurikari
(http://laurikari.net/tre/), which supports MBCS
The main effect of
useBytes is to avoid errors/warnings about
invalid inputs and spurious matches in multibyte locales.
It inhibits the conversion of inputs with marked encodings, and is
forced if any input is found which is marked as
agrep("lasy", "1 lazy 2") agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0)) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)
Run the code above in your browser using DataCamp Workspace