stri_wrap
and stri_split_boundaries
.
stri_count_boundaries
.
stri_extract_all_words
and then stri_unique
.
stri_trans_totitle
.
stri_locate_all_boundaries
.
Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.
stringi uses ICU's BreakIterator
to locate specific
text boundaries. Note that the BreakIterator
's behavior
may be controlled in come cases, see stri_opts_brkiter
.
character
boundary iterator tries to match what a user
would think of as a ``character'' -- a basic unit of a writing system
for a language -- which may be more than just a single Unicode code point.
word
boundary iterator locates the boundaries
of words, for purposes such as ``Find whole words'' operations.
line_break
iterator locates positions that would
be appropriate points to wrap lines when displaying the text.
sentence
locates sentence boundaries.
For technical details on different classes of text boundaries refer to the ICU User Guide, see below.
%s<%< a="">%<>
,
stri_compare
,
stri_count_boundaries
,
stri_duplicated
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-coll
Other stringi_general_topics: stringi-arguments
,
stringi-encoding
,
stringi-locale
,
stringi-package
,
stringi-search-charclass
,
stringi-search-coll
,
stringi-search-fixed
,
stringi-search-regex
,
stringi-search
Other text_boundaries: stri_count_boundaries
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_brkiter
,
stri_split_boundaries
,
stri_split_lines
,
stri_trans_tolower
,
stri_wrap
, stringi-search