stringi (version 0.3-1)

stri_locate_all: Locate Occurrences of a Pattern

Description

These functions may be used e.g. to find the indices (positions), at which a given pattern is matched. stri_locate_all_* locates all the matches. On the other hand, stri_locate_first_* and stri_locate_last_* give the first or the last matches, respectively.

Usage

stri_locate_all(str, ..., regex, fixed, coll, charclass)

stri_locate_first(str, ..., regex, fixed, coll, charclass)

stri_locate_last(str, ..., regex, fixed, coll, charclass)

stri_locate(str, ..., regex, fixed, coll, charclass, mode = c("first", "all", "last"))

stri_locate_all_charclass(str, pattern, merge = TRUE)

stri_locate_first_charclass(str, pattern)

stri_locate_last_charclass(str, pattern)

stri_locate_all_coll(str, pattern, opts_collator = NULL)

stri_locate_first_coll(str, pattern, opts_collator = NULL)

stri_locate_last_coll(str, pattern, opts_collator = NULL)

stri_locate_all_regex(str, pattern, opts_regex = NULL)

stri_locate_first_regex(str, pattern, opts_regex = NULL)

stri_locate_last_regex(str, pattern, opts_regex = NULL)

stri_locate_all_fixed(str, pattern)

stri_locate_first_fixed(str, pattern)

stri_locate_last_fixed(str, pattern)

Arguments

str
character vector with strings to search in
...
additional arguments passed to the underlying functions
mode
single string; one of: "first" (the default), "all", "last"
pattern,regex,fixed,coll,charclass
character vector defining search patterns; for more details refer to stringi-search
merge
single logical value; indicates whether consecutive sequences of indices in the resulting matrix shall be merged; stri_locate_all_charclass only
opts_collator
a named list with ICU Collator's settings as generated with stri_opts_collator; NULL for default settings; stri_locate_*_coll only
opts_regex
a named list with ICU Regex settings as generated with stri_opts_regex; NULL for default settings; stri_locate_*_regex only

Value

  • For stri_locate_all*, a list of integer matrices is returned. Each list element represents the results of a separate search scenario. The first column gives the start positions of matches, and the second column gives the end positions. Moreover, you may get two NAs in one row for no match or NA arguments.

    stri_locate_first* and stri_locate_last*, on the other hand, return an integer matrix with two columns, giving the start and end positions of the first or the last matches, respectively, and two NAs if and only if they are not found.

Details

Vectorized over str and pattern.

The matched string(s) may be extracted by calling the stri_sub function. Alternatively, you may call stri_extract directly.

stri_locate, stri_locate_all, stri_locate_first, and stri_locate_last are convenience functions. They just call stri_locate_*_*, depending on arguments used. Unless you are a very lazy person, please call the underlying functions directly for better performance.

See Also

Other indexing: stri_locate_boundaries, stri_locate_words; stri_sub, stri_sub<-

Other search_locate: stri_locate_boundaries, stri_locate_words; stringi-search

Examples

Run this code
stri_locate_all('XaaaaX',
   regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_all('Bartolini', fixed='i')
stri_locate_all('a b c', charclass='\\p{Zs}') # all white spaces

stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}')
stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE)
stri_locate_first_charclass('AaBbCc', '\\p{Ll}')
stri_locate_last_charclass('AaBbCc', '\\p{Ll}')

stri_locate_all_coll(c('AaaaaaaA', 'AAAA'), 'a')
stri_locate_first_coll(c('Yy\u00FD', 'AAA'), 'y',
   stri_opts_collator(strength=2, locale="sk_SK"))
stri_locate_last_coll(c('Yy\u00FD', 'AAA'), 'y',
   stri_opts_collator(strength=1, locale="sk_SK"))

pat <- stri_paste("\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 ",
                  "\u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ")
stri_locate_last_coll("\ufdfa\ufdfa\ufdfaXYZ", pat,
   stri_opts_collator(strength = 1))

stri_locate_all_fixed(c('AaaaaaaA', 'AAAA'), 'a')
stri_locate_first_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a')
stri_locate_last_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a')

#first row is 1-2 like in locate_first
stri_locate_all_fixed('bbbbb', 'bb')
stri_locate_first_fixed('bbbbb', 'bb')

# but last row is 3-4, unlike in locate_last,
# keep this in mind [overlapping pattern match OK]!
stri_locate_last_fixed('bbbbb', 'bb')

stri_locate_all_regex('XaaaaX',
   c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_first_regex('XaaaaX',
   c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_last_regex('XaaaaX',
   c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))

# Use regex positive-lookahead to locate overlapping pattern matches:
stri_locate_all_regex("ACAGAGACTTTAGATAGAGAAGA", "(?=AGA)")
# note that start > end here (match of 0 length)

Run the code above in your browser using DataLab