stringi (version 1.6.2)

stri_locate_all_boundaries: Locate Text Boundaries


These functions locate text boundaries (like character, word, line, or sentence boundaries). Use stri_locate_all_* to locate all the matches. stri_locate_first_* and stri_locate_last_* give the first or the last matches, respectively.


  omit_no_match = FALSE,
  opts_brkiter = NULL

stri_locate_last_boundaries(str, ..., opts_brkiter = NULL)

stri_locate_first_boundaries(str, ..., opts_brkiter = NULL)

stri_locate_all_words(str, omit_no_match = FALSE, locale = NULL)

stri_locate_last_words(str, locale = NULL)

stri_locate_first_words(str, locale = NULL)



character vector or an object coercible to


single logical value; if FALSE, then two missing values will indicate that there are no text boundaries


additional settings for opts_brkiter


a named list with ICU BreakIterator's settings, see stri_opts_brkiter; NULL for default break iterator, i.e., line_break


NULL or '' for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see stringi-locale


For stri_locate_all_*, a list of length(str) integer matrices is returned. The first column gives the start positions of substrings between located boundaries, and the second column gives the end positions. The indexes are code point-based, thus they may be passed, e.g., to stri_sub or stri_sub_all. Note that you get two NAs in one row if there is no match (and omit_no_match is FALSE) or there are missing data in the input vector.

stri_locate_first_* and stri_locate_last_*, return an integer matrix with two columns, giving the start and end positions of the first or the last matches, respectively, and two NAs if there is no match.


Vectorized over str.

For more information on text boundary analysis performed by ICU's BreakIterator, see stringi-search-boundaries.

In case of stri_locate_*_words, just like in stri_extract_all_words and stri_count_words, ICU's word BreakIterator iterator is used to locate the word boundaries, and all non-word characters (UBRK_WORD_NONE rule status) are ignored. This is function is equivalent to a call to stri_locate_*_boundaries(str, type='word', skip_word_none=TRUE, locale=locale)

See Also

The official online manual of stringi at

Other search_locate: about_search, stri_locate_all()

Other indexing: stri_locate_all(), stri_sub_all(), stri_sub()

Other locale_sensitive: %s<%(), about_locale, about_search_boundaries, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()

Other text_boundaries: about_search_boundaries, about_search, stri_count_boundaries(), stri_extract_all_boundaries(), stri_opts_brkiter(), stri_split_boundaries(), stri_split_lines(), stri_trans_tolower(), stri_wrap()


test <- 'The\u00a0above-mentioned    features are very useful. Spam, spam, eggs, bacon, and spam.'
stri_locate_all_boundaries(test, type='line')
stri_locate_all_boundaries(test, type='word')
stri_locate_all_boundaries(test, type='sentence')
stri_locate_all_boundaries(test, type='character')

stri_extract_all_boundaries('Mr. Jones and Mrs. Brown are very happy.
So am I, Prof. Smith.', type='sentence', locale='en_US@ss=standard') # ICU >= 56 only

# }