|
*
+
?
{n}
{n,}
{n,m}
*?
+?
??
{n}?
{n,}?
{n,m}?
*+
++
?+
{n}+
{n,}+
{n,m}+
(...)
stri_match
.(?:...)
(?>...)
(?>
.(?#...)
(?# comment )
.(?=...)
(?!...)
(?<=...)< code="">=...)<>
*
or +
operators.)(?
*
or +
operators.)(?ismwx-ismwx:...)
-
disabled,
see also stri_opts_regex
.(?ismwx-ismwx)
(?i)
changes to a case insensitive match,
see also stri_opts_regex
.\a
\u0007
.\A
^
.
in that \A
will not match after a new line within the input.\b
\w
) and non-word
(\W
) characters, with combining marks ignored. For better word
boundaries, see ICU Boundary Analysis, e.g. stri_extract_all_words
.\B
\cX
X
character.\d
Nd
(Number, Decimal Digit.).\D
\e
\u001B
.\E
\Q
... \E
quoted sequence.\f
\u000C
.\G
\n
\u000A
.\N{UNICODE CHARACTER NAME}
\p{UNICODE PROPERTY NAME}
\P{UNICODE PROPERTY NAME}
\Q
\E
.\r
\u000D
.\s
[\t\n\f\r\p{Z}]
.\S
\t
\u0009
.\uhhhh
hhhh
.\Uhhhhhhhh
hhhhhhhh
.
Exactly eight hex digits must be provided, even though the largest
Unicode code point is \U0010ffff
.\w
[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\u200c\u200d]
.\W
\x{hhhh}
\xhh
\X
\Z
\z
\n
\0ooo
'ooo'
is from one to three
octal digits. 0377 is the largest allowed Octal character. The leading
zero is required; it distinguishes Octal constants from back references.[pattern]
.
stri_opts_regex
.^
$
\
* ? + [ ( ) { } ^ $ | \ .
.\
[ ] \
; Characters that may need to be quoted, depending
on the context are - &
.pattern
is empty,
then all functions in stringi give NA
in result
and generate a warning.
On a syntax error, a quite informative failure message is shown. If you would like to search for a fixed pattern,
refer to stringi-search-coll or stringi-search-fixed.
This allows to do a locale-aware text lookup,
or a very fast exact-byte search, respectively.stri_*_regex
functions in stringi use
the ICU regex engine. Its settings may be tuned up (for example
to perform case-insensitive search), see the
stri_opts_regex
function for more details.
Regular expression patterns in ICU are quite similar in form and
behavior to Perl's regexes. Their implementation is loosely inspired
by JDK 1.4 java.util.regex
.
ICU Regular Expressions conform to the Unicode Technical Standard #18
(see References section) and its features are summarized in
the ICU User Guide (see below). A good general introduction
to regexes is (Friedl, 2002).
Some general topics are also covered in the R manual, see regex.
J.E.F. Friedl, Mastering Regular Expressions, O'Reilly, 2002
Unicode Regular Expressions -- Unicode Technical Standard #18, http://www.unicode.org/reports/tr18/
Unicode Regular Expressions -- Regex tutorial, http://www.regular-expressions.info/unicode.html
stri_opts_regex
,
stringi-search
Other stringi_general_topics: stringi-arguments
,
stringi-encoding
,
stringi-locale
,
stringi-package
,
stringi-search-boundaries
,
stringi-search-charclass
,
stringi-search-coll
,
stringi-search-fixed
,
stringi-search