Learn R Programming

plyr (version 1.8.3)

daply: Split data frame, apply function, and return results in an array.

Description

For each subset of data frame, apply function then combine results into an array. daply with a function that operates column-wise is similar to aggregate. To apply a function for each row, use aaply with .margins set to 1.

Usage

daply(.data, .variables, .fun = NULL, ..., .progress = "none",
  .inform = FALSE, .drop_i = TRUE, .drop_o = TRUE, .parallel = FALSE,
  .paropts = NULL)

Arguments

.data
data frame to be processed
.variables
variables to split data frame by, as quoted variables, a formula or character vector
.fun
function to apply to each piece
...
other arguments passed on to .fun
.progress
name of the progress bar to use, see create_progress_bar
.inform
produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging
.drop_i
should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)
.drop_o
should extra dimensions of length 1 in the output be dropped, simplifying the output. Defaults to TRUE
.parallel
if TRUE, apply function in parallel, using parallel backend provided by foreach
.paropts
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .e

Value

  • if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)

Input

This function splits data frames by variables.

Output

If there are no results, then this function will return a vector of length 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

See Also

Other array output: aaply; laply; maply

Other data frame input: d_ply; ddply; dlply

Examples

Run this code
daply(baseball, .(year), nrow)

# Several different ways of summarising by variables that should not be
# included in the summary

daply(baseball[, c(2, 6:9)], .(year), colwise(mean))
daply(baseball[, 6:9], .(baseball$year), colwise(mean))
daply(baseball, .(year), function(df) colwise(mean)(df[, 6:9]))

Run the code above in your browser using DataLab