aregexec
Approximate String Match Positions
Determine positions of approximate string matches.
- Keywords
- character
Usage
aregexec(pattern, text, max.distance = 0.1, costs = NULL,
ignore.case = FALSE, fixed = FALSE, useBytes = FALSE)
Arguments
- pattern
a non-empty character string or a character string containing a regular expression (for
fixed = FALSE
) to be matched. Coerced byas.character
to a string if possible.- text
character vector where matches are sought. Coerced by
as.character
to a character vector if possible.- max.distance
maximum distance allowed for a match. See
agrep
.- costs
cost of transformations. See
agrep
.- ignore.case
a logical. If
TRUE
, case is ignored for computing the distances.- fixed
If
TRUE
, the pattern is matched literally (as is). Otherwise (default), it is matched as a regular expression.- useBytes
a logical. If
TRUE
comparisons are byte-by-byte rather than character-by-character.
Details
aregexec
provides a different interface to approximate string
matching than agrep
(along the lines of the interfaces
to exact string matching provided by regexec
and
grep
).
Note that by default, agrep
performs literal matches,
whereas aregexec
performs regular expression matches.
See agrep
and adist
for more information
about approximate string matching and distances.
Comparisons are byte-by-byte if pattern
or any element of
text
is marked as "bytes"
.
Value
A list of the same length as text
, each element of which is
either \(-1\) if there is no match, or a sequence of integers with
the starting positions of the match and all substrings corresponding
to parenthesized subexpressions of pattern
, with attribute
"match.length"
an integer vector giving the lengths of the
matches (or \(-1\) for no match).
See Also
regmatches
for extracting the matched substrings.
Examples
library(utils)
# NOT RUN {
## Cf. the examples for agrep.
x <- c("1 lazy", "1", "1 LAZY")
aregexec("laysy", x, max.distance = 2)
aregexec("(lay)(sy)", x, max.distance = 2)
aregexec("(lay)(sy)", x, max.distance = 2, ignore.case = TRUE)
m <- aregexec("(lay)(sy)", x, max.distance = 2)
regmatches(x, m)
# }