stringi (version 1.2.4)

stri_extract_all: Extract Occurrences of a Pattern

Description

These functions extract all substrings matching a given pattern.

stri_extract_all_* extracts all the matches. On the other hand, stri_extract_first_* and stri_extract_last_* provide the first or the last matches, respectively.

Usage

stri_extract_all(str, ..., regex, fixed, coll, charclass)

stri_extract_first(str, ..., regex, fixed, coll, charclass)

stri_extract_last(str, ..., regex, fixed, coll, charclass)

stri_extract(str, ..., regex, fixed, coll, charclass, mode = c("first", "all", "last"))

stri_extract_all_charclass(str, pattern, merge = TRUE, simplify = FALSE, omit_no_match = FALSE)

stri_extract_first_charclass(str, pattern)

stri_extract_last_charclass(str, pattern)

stri_extract_all_coll(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_collator = NULL)

stri_extract_first_coll(str, pattern, ..., opts_collator = NULL)

stri_extract_last_coll(str, pattern, ..., opts_collator = NULL)

stri_extract_all_regex(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_regex = NULL)

stri_extract_first_regex(str, pattern, ..., opts_regex = NULL)

stri_extract_last_regex(str, pattern, ..., opts_regex = NULL)

stri_extract_all_fixed(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_fixed = NULL)

stri_extract_first_fixed(str, pattern, ..., opts_fixed = NULL)

stri_extract_last_fixed(str, pattern, ..., opts_fixed = NULL)

Arguments

str

character vector with strings to search in

...

supplementary arguments passed to the underlying functions, including additional settings for opts_collator, opts_regex, and so on

mode

single string; one of: "first" (the default), "all", "last"

pattern, regex, fixed, coll, charclass

character vector defining search patterns; for more details refer to stringi-search

merge

single logical value; should consecutive matches be merged into one string; stri_extract_all_charclass only

simplify

single logical value; if TRUE or NA, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value; stri_extract_all_* only

omit_no_match

single logical value; if FALSE, then a missing value will indicate that there was no match; stri_extract_all_* only

opts_collator, opts_fixed, opts_regex

a named list used to tune up a search engine's settings; see stri_opts_collator, stri_opts_fixed, and stri_opts_regex, respectively; NULL for default settings;

Value

For stri_extract_all*, if simplify=FALSE (the default), then a list of character vectors is returned. Each list element represents the results of a separate search scenario. If a pattern is not found and omit_no_match=FALSE, then a character vector of length 1, with single NA value will be generated. Otherwise, i.e. if simplify is not FALSE, then stri_list2matrix with byrow=TRUE argument is called on the resulting object. In such a case, a character matrix with an appropriate number of rows (according to the length of str, pattern, etc.) is returned. Note that stri_list2matrix's fill argument is set to an empty string and NA, for simplify equal to TRUE and NA, respectively.

stri_extract_first* and stri_extract_last*, on the other hand, return a character vector. A NA element indicates no match.

Details

Vectorized over str and pattern.

If you would like to extract regex capture groups individually, check out stri_match.

stri_extract, stri_extract_all, stri_extract_first, and stri_extract_last are convenience functions. They just call stri_extract_*_*, depending on the arguments used. Relying on one of those underlying functions will make your code run slightly faster.

See Also

Other search_extract: stri_extract_all_boundaries, stri_match_all, stringi-search

Examples

Run this code
# NOT RUN {
stri_extract_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_all('Bartolini', coll='i')
stri_extract_all('stringi is so good!', charclass='\\p{Zs}') # all whitespaces

stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}')
stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE)
stri_extract_first_charclass('AaBbCc', '\\p{Ll}')
stri_extract_last_charclass('AaBbCc', '\\p{Ll}')

# }
# NOT RUN {
# emoji support available since ICU 57
stri_extract_all_charclass(stri_enc_fromutf32(32:55200), "\\p{EMOJI}")
# }
# NOT RUN {
stri_extract_all_coll(c('AaaaaaaA', 'AAAA'), 'a')
stri_extract_first_coll(c('Yy\u00FD', 'AAA'), 'y', strength=2, locale="sk_SK")
stri_extract_last_coll(c('Yy\u00FD', 'AAA'), 'y',  strength=1, locale="sk_SK")

stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_first_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_extract_last_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))

stri_list2matrix(stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+')))
stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=TRUE)
stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=NA)

stri_extract_all_fixed("abaBAba", "Aba", case_insensitive=TRUE)
stri_extract_all_fixed("abaBAba", "Aba", case_insensitive=TRUE, overlap=TRUE)

# }

Run the code above in your browser using DataCamp Workspace