str_split: Split up a string into pieces

Description

This family of functions provides various ways of splitting a string up into pieces. These two functions return a character vector:

str_split_1() takes a single string and splits it into pieces, returning a single character vector.
str_split_i() splits each string in a character vector into pieces and extracts the ith value, returning a character vector.

These two functions return a more complex object:

str_split() splits each string in a character vector into a varying number of pieces, returning a list of character vectors.
str_split_fixed() splits each string in a character vector into a fixed number of pieces, returning a character matrix.

Usage

str_split(string, pattern, n = Inf, simplify = FALSE)
str_split_1(string, pattern)
str_split_fixed(string, pattern, n)
str_split_i(string, pattern, i)

Value

str_split_1(): a character vector.
str_split(): a list the same length as string/pattern containing character vectors.
str_split_fixed(): a character matrix with n columns and the same number of rows as the length of string/pattern.
str_split_i(): a character vector the same length as string/pattern.

Arguments

string

Input vector. Either a character vector, or something coercible to one.

pattern

Pattern to look for.

The default interpretation is a regular expression, as described in vignette("regular-expressions"). Use regex() for finer control of the matching behaviour.

Match a fixed string (i.e. by comparing only bytes), using fixed(). This is fast, but approximate. Generally, for matching human text, you'll want coll() which respects character matching rules for the specified locale.

Match character, word, line and sentence boundaries with boundary(). An empty pattern, "", is equivalent to boundary("character").

n

Maximum number of pieces to return. Default (Inf) uses all possible split positions.

For str_split(), this determines the maximum length of each element of the output. For str_split_fixed(), this determines the number of columns in the output; if an input is too short, the result will be padded with "".

simplify

A boolean.

FALSE (the default): returns a list of character vectors.
TRUE: returns a character matrix.

i

Element to return. Use a negative value to count from the right hand side.

Examples

Run this code

fruits <- c(
  "apples and oranges and pears and bananas",
  "pineapples and mangos and guavas"
)

str_split(fruits, " and ")
str_split(fruits, " and ", simplify = TRUE)

# If you want to split a single string, use `str_split_1`
str_split_1(fruits[[1]], " and ")

# Specify n to restrict the number of possible matches
str_split(fruits, " and ", n = 3)
str_split(fruits, " and ", n = 2)
# If n greater than number of pieces, no padding occurs
str_split(fruits, " and ", n = 5)

# Use fixed to return a character matrix
str_split_fixed(fruits, " and ", 3)
str_split_fixed(fruits, " and ", 4)

# str_split_i extracts only a single piece from a string
str_split_i(fruits, " and ", 1)
str_split_i(fruits, " and ", 4)
# use a negative number to select from the end
str_split_i(fruits, " and ", -1)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples