stringi (version 0.3-1)

stri_extract_all: Extract Occurrences of a Pattern

Description

These functions extract all substrings matching a given pattern.

stri_extract_all_* extracts all the matches. On the other hand, stri_extract_first_* and stri_extract_last_* provide the first or the last matches, respectively.

Usage

stri_extract_all(str, ..., regex, coll, charclass)

stri_extract_first(str, ..., regex, coll, charclass)

stri_extract_last(str, ..., regex, coll, charclass)

stri_extract(str, ..., regex, coll, charclass, mode = c("first", "all", "last"))

stri_extract_all_charclass(str, pattern, merge = TRUE, simplify = FALSE)

stri_extract_first_charclass(str, pattern)

stri_extract_last_charclass(str, pattern)

stri_extract_all_coll(str, pattern, simplify = FALSE, opts_collator = NULL)

stri_extract_first_coll(str, pattern, opts_collator = NULL)

stri_extract_last_coll(str, pattern, opts_collator = NULL)

stri_extract_all_regex(str, pattern, simplify = FALSE, opts_regex = NULL)

stri_extract_first_regex(str, pattern, opts_regex = NULL)

stri_extract_last_regex(str, pattern, opts_regex = NULL)

Arguments

str
character vector with strings to search in
...
additional arguments passed to the underlying functions
mode
single string; one of: "first" (the default), "all", "last"
pattern,regex,coll,charclass
character vector defining search patterns; for more details refer to stringi-search
merge
single logical value; should consecutive matches be merged into one string; stri_extract_all_charclass only
simplify
single logical value; if TRUE, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value; stri_extract_all_* only
opts_collator
a named list with ICU Collator's settings as generated with stri_opts_collator; NULL for default settings; stri_extract_*_coll only
opts_regex
a named list with ICU Regex settings as generated with stri_opts_regex; NULL for default settings; stri_extract_*_regex only

Value

  • For stri_extract_all*, if simplify == FALSE (the default), then a list of character vectors is returned. Each list element represents the results of a separate search scenario. If a pattern is not found, then a character vector of length 1, with single NA value will be generated. Otherwise, i.e. if simplify == TRUE, then stri_list2matrix with byrow=TRUE argument is called on the resulting object. In such a case, a character matrix with an appropriate number of rows (according to the length of str, pattern, etc.) is returned.

    stri_extract_first* and stri_extract_last*, on the other hand, return a character vector. A NA element indicates no match.

Details

Vectorized over str and pattern.

Note that a stri_extract_*_fixed family of functions does not make sense. Thus, it has not been implemented in stringi.

If you would like to extract regex capture groups individually, check out stri_match.

stri_extract, stri_extract_all, stri_extract_first, and stri_extract_last are convenience functions. They just call stri_extract_*_*, depending on arguments used. Unless you are a very lazy person, please call the underlying functions directly for better performance.

See Also

Other search_extract: stri_extract_words; stri_match, stri_match_all, stri_match_all_regex, stri_match_first, stri_match_first_regex, stri_match_last, stri_match_last_regex; stringi-search

Examples

Run this code
stri_extract_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_all('Bartolini', coll='i')
stri_extract_all('stringi is so good!', charclass='\\p{Zs}') # all whitespaces

stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}')
stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE)
stri_extract_first_charclass('AaBbCc', '\\p{Ll}')
stri_extract_last_charclass('AaBbCc', '\\p{Ll}')

stri_extract_all_coll(c('AaaaaaaA', 'AAAA'), 'a')
stri_extract_first_coll(c('Yy\u00FD', 'AAA'), 'y',
   stri_opts_collator(strength=2, locale="sk_SK"))
stri_extract_last_coll(c('Yy\u00FD', 'AAA'), 'y',
   stri_opts_collator(strength=1, locale="sk_SK"))

stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_first_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_last_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))

stri_list2matrix(stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+')))
stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=TRUE)

Run the code above in your browser using DataLab