grepl2 indicates whether a string matches the corresponding pattern
or not.
grepv2 returns a subset of x matching the corresponding
patterns. Its replacement version allows for substituting such a subset with
new content.
grepl2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE, invert = FALSE)grepv2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE, invert = FALSE)
grepv2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE, invert = FALSE) <- value
grepl(
pattern,
x,
...,
ignore.case = FALSE,
fixed = FALSE,
invert = FALSE,
perl = FALSE,
useBytes = FALSE
)
grep(
pattern,
x,
...,
ignore.case = FALSE,
fixed = FALSE,
value = FALSE,
invert = FALSE,
perl = FALSE,
useBytes = FALSE
)
grepl2 and [DEPRECATED] grep return a logical vector.
They preserve the attributes of the longest inputs (unless they are
dropped due to coercion). Missing values in the inputs are propagated
consistently.
grepv2 and [DEPRECATED] grep with value=TRUE returns
a subset of x with elements matching the corresponding patterns.
[DEPRECATED] grep with value=FALSE returns the indexes
in x where a match occurred.
Missing values are not included in the outputs and only the names
attribute is preserved, because the length of the result may be different
than that of x.
The replacement version of grepv2 modifies x 'in-place'.
character vector whose elements are to be examined
character vector of nonempty search patterns;
for grepv2 and grep, must not be longer than x
further arguments to stri_detect,
e.g., max_count, locale, dotall
single logical value; indicates whether matching should be case-insensitive
single logical value;
FALSE for matching with regular expressions
(see about_search_regex);
TRUE for fixed pattern matching
(about_search_fixed);
NA for the Unicode collation algorithm
(about_search_coll)
single logical value; indicates whether a no-match is rather of interest
character vector of replacement strings
or a single logical value
indicating whether indexes of strings in x matching
patterns should be returned
not used (with a warning if attempting to do so) [DEPRECATED]
grepl and grep are [DEPRECATED] replacements for base
grep and grepl
implemented with stri_detect.
there are inconsistencies between the argument order and naming
in grepl, strsplit,
and startsWith (amongst others); e.g.,
where the needle can precede the haystack, the use of the forward
pipe operator, |>, is less convenient
[fixed by introducing grepl2]
base R implementation is not portable as it is based on
the system PCRE or TRE library
(e.g., some Unicode classes may not be available or matching thereof
can depend on the current LC_CTYPE category
[fixed here]
not suitable for natural language processing
[fixed here -- use fixed=NA]
two different regular expression libraries are used
(and historically, ERE was used in place of TRE)
[here, ICU Java-like regular expression engine
is only available, hence the perl argument has no meaning]
not vectorised w.r.t. pattern
[fixed here, however, in grep, pattern cannot be
longer than x]
missing values in haystack will result in a no-match
[fixed in grepl; see Value]
ignore.case=TRUE cannot be used with fixed=TRUE
[fixed here]
no attributes are preserved [fixed here; see Value]
These functions are fully vectorised with respect to x and
pattern.
The [DEPRECATED] grepl simply calls
grepl2 which have a cleaned-up argument list.
The [DEPRECATED] grep with value=FALSE is actually redundant --
it can be trivially reproduced with grepl and
which.
grepv2 and grep with value=FALSE combine
pattern matching and subsetting and some users may find it convenient
in conjunction with the forward pipe operator, |>.
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): paste, nchar,
strsplit, gsub2,
gregexpr2, gregextr2,
gsubstr
x <- c("abc", "1237", "\U0001f602", "\U0001f603", "stringx\U0001f970", NA)
grepl2(x, "\\p{L}")
which(grepl2(x, "\\p{L}")) # like grep
# at least 1 letter or digit:
p <- c("\\p{L}", "\\p{N}")
`dimnames<-`(outer(x, p, grepl2), list(x, p))
x |> grepv2("\\p{L}")
grepv2(x, "\\p{L}", invert=TRUE) <- "\U0001F496"
print(x)
Run the code above in your browser using DataLab