stringi (version 0.3-1)

stringi-search-boundaries: Text Boundary Analysis in stringi


Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.



Examples of the boundary analysis process process include:

Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.

stringi uses ICU's BreakIterator to locate specific text boundaries. Note that the BreakIterator's behavior may be controlled in come cases, see stri_opts_brkiter.

  • Thecharacterboundary iterator tries to match what a user would think of as a ``character'' -- a basic unit of a writing system for a language -- which may be more than just a single Unicode code point.
  • Thewordboundary iterator locates the boundaries of words, for purposes such as ``Find whole words'' operations.
  • Theline_breakiterator locates positions that would be appropriate points to wrap lines when displaying the text.
  • On the other hand, a break iterator of typesentencelocates sentence boundaries.

For technical details on different classes of text boundaries refer to the ICU User Guide, see below.


Boundary Analysis -- ICU User Guide,

See Also

Other locale_sensitive: %s!==%, %s!=%, %s<=%< a="">, %s<%< a="">, %s===%, %s==%, %s>=%, %s>%, %stri!==%, %stri!=%, %stri<=%< a="">, %stri<%< a="">, %stri===%, %stri==%, %stri>=%, %stri>%; stri_cmp, stri_cmp_eq, stri_cmp_equiv, stri_cmp_ge, stri_cmp_gt, stri_cmp_le, stri_cmp_lt, stri_cmp_neq, stri_cmp_nequiv, stri_compare; stri_count_boundaries, stri_count_words; stri_duplicated, stri_duplicated_any; stri_enc_detect2; stri_extract_words; stri_locate_boundaries, stri_locate_words; stri_opts_collator; stri_order, stri_sort; stri_split_boundaries; stri_trans_tolower, stri_trans_totitle, stri_trans_toupper; stri_unique; stri_wrap; stringi-locale; stringi-search-coll

Other stringi_general_topics: stringi-arguments; stringi-encoding; stringi-locale; stringi-package; stringi-search-charclass; stringi-search-coll; stringi-search-fixed; stringi-search-regex; stringi-search

Other text_boundaries: stri_count_boundaries, stri_count_words; stri_extract_words; stri_locate_boundaries, stri_locate_words; stri_opts_brkiter; stri_split_boundaries; stri_split_lines, stri_split_lines1, stri_split_lines1; stri_trans_tolower, stri_trans_totitle, stri_trans_toupper; stri_wrap; stringi-search