This family of functions provides various ways of splitting a string up into pieces. These two functions return a character vector:
str_split_1()
takes a single string and splits it into pieces,
returning a single character vector.
str_split_i()
splits each string in a character vector into pieces and
extracts the i
th value, returning a character vector.
These two functions return a more complex object:
str_split()
splits each string in a character vector into a varying
number of pieces, returning a list of character vectors.
str_split_fixed()
splits each string in a character vector into a
fixed number of pieces, returning a character matrix.
str_split(string, pattern, n = Inf, simplify = FALSE)str_split_1(string, pattern)
str_split_fixed(string, pattern, n)
str_split_i(string, pattern, i)
str_split_1()
: a character vector.
str_split()
: a list the same length as string
/pattern
containing
character vectors.
str_split_fixed()
: a character matrix with n
columns and the same
number of rows as the length of string
/pattern
.
str_split_i()
: a character vector the same length as string
/pattern
.
Input vector. Either a character vector, or something coercible to one.
Pattern to look for.
The default interpretation is a regular expression, as described in
vignette("regular-expressions")
. Use regex()
for finer control of the
matching behaviour.
Match a fixed string (i.e. by comparing only bytes), using
fixed()
. This is fast, but approximate. Generally,
for matching human text, you'll want coll()
which
respects character matching rules for the specified locale.
Match character, word, line and sentence boundaries with
boundary()
. An empty pattern, "", is equivalent to
boundary("character")
.
Maximum number of pieces to return. Default (Inf) uses all possible split positions.
For str_split()
, this determines the maximum length of each element
of the output. For str_split_fixed()
, this determines the number of
columns in the output; if an input is too short, the result will be padded
with ""
.
A boolean.
FALSE
(the default): returns a list of character vectors.
TRUE
: returns a character matrix.
Element to return. Use a negative value to count from the right hand side.
stri_split()
for the underlying implementation.
fruits <- c(
"apples and oranges and pears and bananas",
"pineapples and mangos and guavas"
)
str_split(fruits, " and ")
str_split(fruits, " and ", simplify = TRUE)
# If you want to split a single string, use `str_split_1`
str_split_1(fruits[[1]], " and ")
# Specify n to restrict the number of possible matches
str_split(fruits, " and ", n = 3)
str_split(fruits, " and ", n = 2)
# If n greater than number of pieces, no padding occurs
str_split(fruits, " and ", n = 5)
# Use fixed to return a character matrix
str_split_fixed(fruits, " and ", 3)
str_split_fixed(fruits, " and ", 4)
# str_split_i extracts only a single piece from a string
str_split_i(fruits, " and ", 1)
str_split_i(fruits, " and ", 4)
# use a negative number to select from the end
str_split_i(fruits, " and ", -1)
Run the code above in your browser using DataLab