
str
into substrings.
pattern
indicates delimiters that separate the input into tokens.
The input data between the matches become the fields themselves.stri_split(str, ..., regex, fixed, coll, charclass)stri_split_fixed(str, pattern, n_max = -1L, omit_empty = FALSE,
tokens_only = FALSE, simplify = FALSE)
stri_split_regex(str, pattern, n_max = -1L, omit_empty = FALSE,
tokens_only = FALSE, simplify = FALSE, opts_regex = NULL)
stri_split_coll(str, pattern, n_max = -1L, omit_empty = FALSE,
tokens_only = FALSE, simplify = FALSE, opts_collator = NULL)
stri_split_charclass(str, pattern, n_max = -1L, omit_empty = FALSE,
tokens_only = FALSE, simplify = FALSE)
stri_split
onlyTRUE
)
or replaced with NA
s (NA
)n_max
is positive, see DetailsTRUE
, then a character matrix is returned;
otherwise (the default), a list of character vectors is given, see Valuestri_opts_regex
; NULL
for default settings;
stri_split_regex
onlystri_opts_collator
; NULL
for default settings;
stri_split_coll
onlysimplify == FALSE
(the default),
then the functions return a list of character vectors.Otherwise, stri_list2matrix
with byrow=TRUE
argument
is called on the resulting object.
In such a case, a character matrix with an appropriate number of rows
(according to the length of str
, pattern
, etc.)
is returned.
str
, pattern
, n_max
, and omit_empty
.If n_max
is negative (default), then all pieces are extracted.
Otherwise, if tokens_only
is FALSE
(this is the default,
for compatibility with the n_max - 1
tokes are extracted (if possible) and the n_max
-th string
gives the (non-split) remainder (see Examples).
On the other hand, if tokens_only
is TRUE
,
then only full tokens (up to n_max
pieces) are extracted.
omit_empty
is applied during splitting: if it is set to TRUE
,
then tokens of zero length are ignored. Thus, empty strings will never
appear in the resulting vector.
On the other hand, if omit_empty
is NA
, then
empty tokes are substituted with missing strings.
Empty search patterns are not supported. If you would like to split a
string into individual characters, use e.g.
stri_split_boundaries(str,
stri_opts_brkiter(type="character"))
for THE Unicode way.
stri_split
is a convenience function.
It calls either stri_split_regex
,
stri_split_fixed
, stri_split_coll
,
or stri_split_charclass
,
depending on the argument used.
Unless you are a very lazy person, please call the underlying functions
directly for better performance.
stri_split_boundaries
;
stri_split_lines
,
stri_split_lines1
,
stri_split_lines1
;
stringi-search
stri_split_fixed("a_b_c_d", "_")
stri_split_fixed("a_b_c__d", "_")
stri_split_fixed("a_b_c__d", "_", omit_empty=TRUE)
stri_split_fixed("a_b_c__d", "_", n_max=2, tokens_only=FALSE) # "a" & remainder
stri_split_fixed("a_b_c__d", "_", n_max=2, tokens_only=TRUE) # "a" & "b" only
stri_split_fixed("a_b_c__d", "_", n_max=4, omit_empty=TRUE, tokens_only=TRUE)
stri_split_fixed("a_b_c__d", "_", n_max=4, omit_empty=FALSE, tokens_only=TRUE)
stri_split_fixed("a_b_c__d", "_", omit_empty=NA)
stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n_max=1, tokens_only=TRUE, omit_empty=TRUE)
stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n_max=2, tokens_only=TRUE, omit_empty=TRUE)
stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n_max=3, tokens_only=TRUE, omit_empty=TRUE)
stri_list2matrix(stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=TRUE))
stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=TRUE, simplify=TRUE)
stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=FALSE, simplify=TRUE)
stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=NA, simplify=TRUE)
stri_split_regex(c("ab,c", "d,ef , g", ", h", ""),
"\\p{WHITE_SPACE}*,\\p{WHITE_SPACE}*", omit_empty=NA, simplify=TRUE)
stri_split_charclass("Lorem ipsum dolor sit amet", "\\p{WHITE_SPACE}")
stri_split_charclass(" Lorem ipsum dolor", "\\p{WHITE_SPACE}", n_max=3,
omit_empty=c(FALSE, TRUE))
stri_split_regex("Lorem ipsum dolor sit amet",
"\\p{Z}+") # see also stri_split_charclass
Run the code above in your browser using DataLab