stringi (version 1.6.2)

about_search_boundaries: Text Boundary Analysis in stringi


Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.



Examples of the boundary analysis process include:

Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.

stringi uses ICU's BreakIterator to locate specific text boundaries. Note that the BreakIterator's behavior may be controlled in come cases, see stri_opts_brkiter.

  • The character boundary iterator tries to match what a user would think of as a ``character'' -- a basic unit of a writing system for a language -- which may be more than just a single Unicode code point.

  • The word boundary iterator locates the boundaries of words, for purposes such as ``Find whole words'' operations.

  • The line_break iterator locates positions that would be appropriate to wrap lines when displaying the text.

  • The break iterator of type sentence locates sentence boundaries.

For technical details on different classes of text boundaries refer to the ICU User Guide, see below.


Boundary Analysis -- ICU User Guide,

See Also

The official online manual of stringi at

Other locale_sensitive: %s<%(), about_locale, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()

Other text_boundaries: about_search, stri_count_boundaries(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_brkiter(), stri_split_boundaries(), stri_split_lines(), stri_trans_tolower(), stri_wrap()

Other stringi_general_topics: about_arguments, about_encoding, about_locale, about_search_charclass, about_search_coll, about_search_fixed, about_search_regex, about_search, about_stringi