Cc
-- a C0 or C1 control code;Cf
-- a format control character;Cn
-- a reserved unassigned code point or a
non-character;Co
-- a private-use
character;Cs
-- a surrogate code point;Lc
-- the union of Lu, Ll, Lt;Ll
-- a lowercase letter;Lm
-- a
modifier letter;Lo
-- other letters,
including syllables and ideographs;Lt
-- a
digraphic character, with first part uppercase;Lu
-- an uppercase letter;Mc
-- a
spacing combining mark (positive advance width);Me
-- an enclosing combining mark;Mn
-- a non-spacing combining mark (zero advance width);Nd
-- a decimal digit;Nl
-- a
letter-like numeric character;No
-- a
numeric character of other type;Pd
-- a
dash or hyphen punctuation mark;Ps
-- an
opening punctuation mark (of a pair);Pe
--
a closing punctuation mark (of a pair);Pc
-- a connecting punctuation mark, like a tie;Po
-- a punctuation mark of other type;Pi
-- an initial quotation mark;Pf
-- a final quotation mark;Sm
-- a symbol of
mathematical use;Sc
-- a currency sign;Sk
-- a non-letter-like modifier symbol;So
-- a symbol of other type;Zs
-- a space character (of non-zero width);Zl
-- U+2028 LINE SEPARATOR only;Zp
-- U+2029 PARAGRAPH SEPARATOR only;C
-- the
union of Cc, Cf, Cs, Co, Cn;L
-- the union
of Lu, Ll, Lt, Lm, Lo;M
-- the union of Mn,
Mc, Me;N
-- the union of Nd, Nl, No;P
-- the union of Pc, Pd, Ps, Pe, Pi, Pf, Po;S
-- the union of Sm, Sc, Sk, So;Z
-- the union of Zs, Zl, Zp.Here is the complete list of supported Binary Properties:
ALPHABETIC
-- alphabetic
character;ASCII_HEX_DIGIT
-- a character
matching the[0-9A-Fa-f]
regex;BIDI_CONTROL
-- a format control which have
specific functions in the Bidi (bidirectional text)
Algorithm;BIDI_MIRRORED
-- a character that
may change display in right-to-left text;DASH
-- a kind of a dash character;DEFAULT_IGNORABLE_CODE_POINT
-- characters that
are ignorable in most text processing activities, e.g.
<2060..206f, fff0..fffb,="" e0000..e0fff="">;2060..206f,>DEPRECATED
-- a deprecated character according to
the current Unicode standard (the usage of deprecated
characters is strongly discouraged);DIACRITIC
-- a character that linguistically
modifies the meaning of another character to which it
applies;EXTENDER
-- a character that
extends the value or shape of a preceding alphabetic
character, e.g. a length and iteration mark.FULL_COMPOSITION_EXCLUSION
;GRAPHEME_BASE
;GRAPHEME_EXTEND
;GRAPHEME_LINK
;HEX_DIGIT
-- a
character commonly used for hexadecimal numbers, cf. alsoASCII_HEX_DIGIT
;HYPHEN
-- a dash
used to mark connections between pieces of words, plus
the Katakana middle dot;ID_CONTINUE
-- a
character that can continue an identifier,ID_START
+Mn
+Mc
+Nd
+Pc
;ID_START
-- a character that can start an
identifier,Lu
+Ll
+Lt
+Lm
+Lo
+Nl
;IDEOGRAPHIC
-- a CJKV
(Chinese-Japanese-Korean-Vietnamese) ideograph;IDS_BINARY_OPERATOR
;IDS_TRINARY_OPERATOR
;JOIN_CONTROL
;LOGICAL_ORDER_EXCEPTION
;LOWERCASE
;MATH
;NONCHARACTER_CODE_POINT
;QUOTATION_MARK
;RADICAL
;SOFT_DOTTED
-- a character with a ``soft dot'',
like i or j, such that an accent placed on this character
causes the dot to disappear;TERMINAL_PUNCTUATION
-- a punctuation character
that generally marks the end of textual units;UNIFIED_IDEOGRAPH
;UPPERCASE
;WHITE_SPACE
-- a space character or TAB or CR or
LF or ZWSP or ZWNBSP;XID_CONTINUE
;XID_START
;CASE_SENSITIVE
;S_TERM
;VARIATION_SELECTOR
;NFD_INERT
;NFKD_INERT
;NFC_INERT
;NFKC_INERT
;SEGMENT_STARTER
;PATTERN_SYNTAX
;PATTERN_WHITE_SPACE
;POSIX_ALNUM
;POSIX_BLANK
;POSIX_GRAPH
;POSIX_PRINT
;POSIX_XDIGIT
;CASED
;CASE_IGNORABLE
;CHANGES_WHEN_LOWERCASED
;CHANGES_WHEN_UPPERCASED
;CHANGES_WHEN_TITLECASED
;CHANGES_WHEN_CASEFOLDED
;CHANGES_WHEN_CASEMAPPED
;CHANGES_WHEN_NFKC_CASEFOLDED
.stri_*_charclass
functions in There are two separate ways to specify character classes in
Lu
for uppercase letters (a
1-2 letter identifier, the same may be used in regexes by
specifying e.g.p{Lu}
)WHITE_SPACE
Additionally, each class identifier may be preceded with '^', which is a way to request for a complement of a given character class, i.e. it is used to match characters not in a class.
Please note that some classes may seem to overlap. However,
e.g. General Category Z
(some space) and Binary
Property WHITE_SPACE
matches different character
sets.
stri_count_charclass
;
stri_detect_charclass
;
stri_extract_all_charclass
,
stri_extract_all_charclass
,
stri_extract_first_charclass
,
stri_extract_first_charclass
,
stri_extract_last_charclass
,
stri_extract_last_charclass
;
stri_locate_all_charclass
,
stri_locate_all_charclass
,
stri_locate_first_charclass
,
stri_locate_first_charclass
,
stri_locate_last_charclass
,
stri_locate_last_charclass
;
stri_replace_all_charclass
,
stri_replace_all_charclass
,
stri_replace_first_charclass
,
stri_replace_first_charclass
,
stri_replace_last_charclass
,
stri_replace_last_charclass
;
stri_split_charclass
,
stri_split_charclass
;
stri_trim
, stri_trim
,
stri_trim_both
, stri_trim_left
,
stri_trim_right
; stringi-search
Other stringi_general_topics:
stringi-arguments
;
stringi-encoding
;
stringi-locale
;
stringi-package
;
stringi-search-fixed
;
stringi-search-regex
;
stringi-search