stringi (version 1.1.5)

stringi-search-boundaries: Text Boundary Analysis in stringi

Description

Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.

Arguments

Details

Examples of the boundary analysis process process include:

Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.

stringi uses ICU's BreakIterator to locate specific text boundaries. Note that the BreakIterator's behavior may be controlled in come cases, see stri_opts_brkiter.

  • The character boundary iterator tries to match what a user would think of as a ``character'' -- a basic unit of a writing system for a language -- which may be more than just a single Unicode code point.

  • The word boundary iterator locates the boundaries of words, for purposes such as ``Find whole words'' operations.

  • The line_break iterator locates positions that would be appropriate points to wrap lines when displaying the text.

  • On the other hand, a break iterator of type sentence locates sentence boundaries.

For technical details on different classes of text boundaries refer to the ICU User Guide, see below.

References

Boundary Analysis -- ICU User Guide, http://userguide.icu-project.org/boundaryanalysis

See Also

Other locale_sensitive: %s<%, stri_compare, stri_count_boundaries, stri_duplicated, stri_enc_detect2, stri_extract_all_boundaries, stri_locate_all_boundaries, stri_opts_collator, stri_order, stri_split_boundaries, stri_trans_tolower, stri_unique, stri_wrap, stringi-locale, stringi-search-coll

Other text_boundaries: stri_count_boundaries, stri_extract_all_boundaries, stri_locate_all_boundaries, stri_opts_brkiter, stri_split_boundaries, stri_split_lines, stri_trans_tolower, stri_wrap, stringi-search

Other stringi_general_topics: stringi-arguments, stringi-encoding, stringi-locale, stringi-package, stringi-search-charclass, stringi-search-coll, stringi-search-fixed, stringi-search-regex, stringi-search