
Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.
Compare strings respecting standard collation rules.
The default. Uses ICU regular expressions.
Match boundaries between things.
fixed(pattern, ignore_case = FALSE)coll(pattern, ignore_case = FALSE, locale = "en", ...)
regex(
pattern,
ignore_case = FALSE,
multiline = FALSE,
comments = FALSE,
dotall = FALSE,
...
)
boundary(
type = c("character", "line_break", "sentence", "word"),
skip_word_none = NA,
...
)
Pattern to modify behaviour.
Should case differences be ignored in the match?
Locale to use for comparisons. See
stringi::stri_locale_list()
for all possible options.
Defaults to "en" (English) to ensure that the default collation is
consistent across platforms.
Other less frequently used arguments passed on to
stringi::stri_opts_collator()
,
stringi::stri_opts_regex()
, or
stringi::stri_opts_brkiter()
If TRUE
, $
and ^
match
the beginning and end of each line. If FALSE
, the
default, only match the start and end of the input.
If TRUE
, white space and comments beginning with
#
are ignored. Escape literal spaces with \\
.
If TRUE
, .
will also match line terminators.
Boundary type to detect.
character
Every character is a boundary.
line_break
Boundaries are places where it is acceptable to have a line break in the current locale.
sentence
The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details).
word
The beginnings and ends of words are boundaries.
Ignore "words" that don't contain any characters
or numbers - i.e. punctuation. Default NA
will skip such "words"
only when splitting on word
boundaries.
str_wrap()
for breaking text to form paragraphs
stringi::stringi-search-boundaries
for more detail on the
various boundaries
pattern <- "a.b"
strings <- c("abb", "a.b")
str_detect(strings, pattern)
str_detect(strings, fixed(pattern))
str_detect(strings, coll(pattern))
# coll() is useful for locale-aware case-insensitive matching
i <- c("I", "\u0130", "i")
i
str_detect(i, fixed("i", TRUE))
str_detect(i, coll("i", TRUE))
str_detect(i, coll("i", TRUE, locale = "tr"))
# Word boundaries
words <- c("These are some words.")
str_count(words, boundary("word"))
str_split(words, " ")[[1]]
str_split(words, boundary("word"))[[1]]
# Regular expression variations
str_extract_all("The Cat in the Hat", "[a-z]+")
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
str_extract_all("a\nb\nc", "^.")
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
str_extract_all("a\nb\nc", "a.")
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
Run the code above in your browser using DataLab