concat.split: Split Concatenated Cells in a Dataset

Description

The concat.split function takes a column with multiple values, splits the values into a list or into separate columns, and returns a new data.frame or data.table.

Usage

concat.split(data, split.col, sep = ",", structure = "compact",
  mode = NULL, type = NULL, drop = FALSE, fixed = FALSE,
  fill = NA, ...)

Arguments

data

The source data.frame or data.table.

split.col

The variable that needs to be split; can be specified either by the column number or the variable name.

sep

The character separating each value (defaults to ",").

structure

Can be either "compact", "expanded", or list. Defaults to "compact". See Details.

mode

Can be either "binary" or "value" (where "binary" is default and it recodes values to 1 or NA, like Boolean data, but without assuming 0 when data is not available). This setting only applies when structure = "expanded"; a warning message will be issued if used with other structures.

type

Can be either "numeric" or "character" (where "numeric" is default). This setting only applies when structure = "expanded"; a warning message will be issued if used with other structures.

drop

Logical (whether to remove the original variable from the output or not). Defaults to FALSE.

fixed

Is the input for the sep value fixed, or a regular expression? See Details.

fill

The "fill" value for missing values when structure = "expanded". Defaults to NA.

…

Additional arguments to cSplit().

Details

structure

"compact" creates as many columns as the maximum length of the resulting split. This is the most useful general-case application of this function.
When the input is numeric, "expanded" creates as many columns as the maximum value of the input data. This is most useful when converting to mode = "binary".
"list" creates a single new column that is structurally a list within a data.frame or data.table.

fixed

When structure = "expanded" or structure = "list", it is possible to supply a a regular expression containing the characters to split on. For example, to split on ",", ";", or "|", you can set sep = ",|;|\|" or sep = "[,;|]", and fixed = FALSE to split on any of those characters.

Examples

Run this code

# NOT RUN {
## Load some data
temp <- head(concat.test)

# Split up the second column, selecting by column number
concat.split(temp, 2)

# ... or by name, and drop the offensive first column
concat.split(temp, "Likes", drop = TRUE)

# The "Hates" column uses a different separator
concat.split(temp, "Hates", sep = ";", drop = TRUE)

# }
# NOT RUN {
# You'll get a warning here, when trying to retain the original values
concat.split(temp, 2, mode = "value", drop = TRUE)
# }
# NOT RUN {
# Try again. Notice the differing number of resulting columns
concat.split(temp, 2, structure = "expanded",
mode = "value", type = "numeric", drop = TRUE)

# Let's try splitting some strings... Same syntax
concat.split(temp, 3, drop = TRUE)

# Strings can also be split to binary representations
concat.split(temp, 3, structure = "expanded",
type = "character", fill = 0, drop = TRUE)

# Split up the "Likes column" into a list variable; retain original column
head(concat.split(concat.test, 2, structure = "list", drop = FALSE))

# View the structure of the output to verify
# that the new column is a list; note the
# difference between "Likes" and "Likes_list".
str(concat.split(temp, 2, structure = "list", drop = FALSE))

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples