Learn R Programming

splitstackshape (version 1.2.0)

concat.split: Split concatenated cells in a data.frame

Description

The concat.split function takes a column with multiple values, splits the values into a list or into separate columns, and returns a new data.frame.

Usage

concat.split(data, split.col, sep = ",",
    structure = "compact", mode = NULL, drop = FALSE,
    fixed = FALSE, fill = NA)

Arguments

data
The source data.frame.
split.col
The variable that needs to be split; can be specified either by the column number or the variable name.
sep
The character separating each value (defaults to ",").
structure
Can be either "compact", "expanded", or "list". Defaults to "compact". See Details.
mode
Can be either "binary" or "value" (where "binary" is default and it recodes values to 1 or NA, like Boolean data, but without assuming 0 when data is not available). This setting only applies wh
drop
Logical (whether to remove the original variable from the output or not). Defaults to FALSE.
fixed
Is the input for the sep value fixed, or a regular expression? See Details.
fill
The "fill" value for missing values when structure = "expanded". Defaults to NA.

Details

structure
  • "compact"creates as many columns as the maximum length of the resulting split. This is the most useful general-case application of this function.
  • When the input is numeric,"expanded"creates as many columns as the maximum value of the input data. This is most useful when converting tomode = "binary".
  • "list"creates a single new column that is structurally alistwithin adata.frame.
fixed
  • Whenstructure = "expanded"orstructure = "list", it is possible to supply a a regular expression containing the characters to split on. For example, to split on",",";", or"|", you can setsep = ",|;||"orsep = "[,;|]", andfixed = FALSEto split on any of those characters.

References

  • Seehttp://stackoverflow.com/q/10100887/1270695for some history of this function, even though the solution is not used at all here.
  • The"condensed"setting was inspired by an answer from David Winsemius to a question at Stack Overflow. See:http://stackoverflow.com/a/13924245/1270695

See Also

concat.split.compact, concat.split.expanded, concat.split.list, concat.split.multiple

Examples

Run this code
## Load some data
temp <- head(concat.test)

# Split up the second column, selecting by column number
concat.split(temp, 2)

# ... or by name, and drop the offensive first column
concat.split(temp, "Likes", drop = TRUE)

# The "Hates" column uses a different separator
concat.split(temp, "Hates", sep = ";", drop = TRUE)

# You'll get a warning here, when trying to retain the original values
concat.split(temp, 2, mode = "value", drop = TRUE)

# Try again. Notice the differing number of resulting columns
concat.split(temp, 2, structure = "expanded",
mode = "value", drop = TRUE)

# Let's try splitting some strings... Same syntax
concat.split(temp, 3, drop = TRUE)

# Split up the "Likes column" into a list variable; retain original column
head(concat.split(concat.test, 2, structure = "list", drop = FALSE))

# View the structure of the output to verify
# that the new column is a list; note the
# difference between "Likes" and "Likes_list".
str(concat.split(temp, 2, structure = "list", drop = FALSE))

Run the code above in your browser using DataLab