Cc-- a C0 or C1 control code;Cf-- a format control character;Cn-- a reserved unassigned code point or a
non-character;Co-- a private-use
character;Cs-- a surrogate code point;Lc-- the union of Lu, Ll, Lt;Ll-- a lowercase letter;Lm-- a
modifier letter;Lo-- other letters,
including syllables and ideographs;Lt-- a
digraphic character, with first part uppercase;Lu-- an uppercase letter;Mc-- a
spacing combining mark (positive advance width);Me-- an enclosing combining mark;Mn-- a non-spacing combining mark (zero advance width);Nd-- a decimal digit;Nl-- a
letter-like numeric character;No-- a
numeric character of other type;Pd-- a
dash or hyphen punctuation mark;Ps-- an
opening punctuation mark (of a pair);Pe--
a closing punctuation mark (of a pair);Pc-- a connecting punctuation mark, like a tie;Po-- a punctuation mark of other type;Pi-- an initial quotation mark;Pf-- a final quotation mark;Sm-- a symbol of
mathematical use;Sc-- a currency sign;Sk-- a non-letter-like modifier symbol;So-- a symbol of other type;Zs-- a space character (of non-zero width);Zl-- U+2028 LINE SEPARATOR only;Zp-- U+2029 PARAGRAPH SEPARATOR only;C-- the
union of Cc, Cf, Cs, Co, Cn;L-- the union
of Lu, Ll, Lt, Lm, Lo;M-- the union of Mn,
Mc, Me;N-- the union of Nd, Nl, No;P-- the union of Pc, Pd, Ps, Pe, Pi, Pf, Po;S-- the union of Sm, Sc, Sk, So;Z-- the union of Zs, Zl, Zp.Here is the complete list of supported Binary Properties:
ALPHABETIC-- alphabetic
character;ASCII_HEX_DIGIT-- a character
matching the[0-9A-Fa-f]regex;BIDI_CONTROL-- a format control which have
specific functions in the Bidi (bidirectional text)
Algorithm;BIDI_MIRRORED-- a character that
may change display in right-to-left text;DASH-- a kind of a dash character;DEFAULT_IGNORABLE_CODE_POINT-- characters that
are ignorable in most text processing activities, e.g.
<2060..206f, fff0..fffb,="" e0000..e0fff="">;2060..206f,>DEPRECATED-- a deprecated character according to
the current Unicode standard (the usage of deprecated
characters is strongly discouraged);DIACRITIC-- a character that linguistically
modifies the meaning of another character to which it
applies;EXTENDER-- a character that
extends the value or shape of a preceding alphabetic
character, e.g. a length and iteration mark.FULL_COMPOSITION_EXCLUSION;GRAPHEME_BASE;GRAPHEME_EXTEND;GRAPHEME_LINK;HEX_DIGIT-- a
character commonly used for hexadecimal numbers, cf. alsoASCII_HEX_DIGIT;HYPHEN-- a dash
used to mark connections between pieces of words, plus
the Katakana middle dot;ID_CONTINUE-- a
character that can continue an identifier,ID_START+Mn+Mc+Nd+Pc;ID_START-- a character that can start an
identifier,Lu+Ll+Lt+Lm+Lo+Nl;IDEOGRAPHIC-- a CJKV
(Chinese-Japanese-Korean-Vietnamese) ideograph;IDS_BINARY_OPERATOR;IDS_TRINARY_OPERATOR;JOIN_CONTROL;LOGICAL_ORDER_EXCEPTION;LOWERCASE;MATH;NONCHARACTER_CODE_POINT;QUOTATION_MARK;RADICAL;SOFT_DOTTED-- a character with a ``soft dot'',
like i or j, such that an accent placed on this character
causes the dot to disappear;TERMINAL_PUNCTUATION-- a punctuation character
that generally marks the end of textual units;UNIFIED_IDEOGRAPH;UPPERCASE;WHITE_SPACE-- a space character or TAB or CR or
LF or ZWSP or ZWNBSP;XID_CONTINUE;XID_START;CASE_SENSITIVE;S_TERM;VARIATION_SELECTOR;NFD_INERT;NFKD_INERT;NFC_INERT;NFKC_INERT;SEGMENT_STARTER;PATTERN_SYNTAX;PATTERN_WHITE_SPACE;POSIX_ALNUM;POSIX_BLANK;POSIX_GRAPH;POSIX_PRINT;POSIX_XDIGIT;CASED;CASE_IGNORABLE;CHANGES_WHEN_LOWERCASED;CHANGES_WHEN_UPPERCASED;CHANGES_WHEN_TITLECASED;CHANGES_WHEN_CASEFOLDED;CHANGES_WHEN_CASEMAPPED;CHANGES_WHEN_NFKC_CASEFOLDED.stri_*_charclass functions in There are two separate ways to specify character classes in
Lufor uppercase letters (a
1-2 letter identifier, the same may be used in regexes by
specifying e.g.p{Lu})WHITE_SPACEAdditionally, each class identifier may be preceded with '^', which is a way to request for a complement of a given character class, i.e. it is used to match characters not in a class.
Please note that some classes may seem to overlap. However,
e.g. General Category Z (some space) and Binary
Property WHITE_SPACE matches different character
sets.
stri_count_charclass;
stri_detect_charclass;
stri_extract_all_charclass,
stri_extract_all_charclass,
stri_extract_first_charclass,
stri_extract_first_charclass,
stri_extract_last_charclass,
stri_extract_last_charclass;
stri_locate_all_charclass,
stri_locate_all_charclass,
stri_locate_first_charclass,
stri_locate_first_charclass,
stri_locate_last_charclass,
stri_locate_last_charclass;
stri_replace_all_charclass,
stri_replace_all_charclass,
stri_replace_first_charclass,
stri_replace_first_charclass,
stri_replace_last_charclass,
stri_replace_last_charclass;
stri_split_charclass,
stri_split_charclass;
stri_trim, stri_trim,
stri_trim_both, stri_trim_left,
stri_trim_right; stringi-searchOther stringi_general_topics:
stringi-arguments;
stringi-encoding;
stringi-locale;
stringi-package;
stringi-search-fixed;
stringi-search-regex;
stringi-search