Learn R Programming

h2o (version 2.8.4.4)

h2o.ddply: Split H2O dataset, apply function, and return results

Description

For each subset of a H2O dataset, apply a user-specified function, then combine the results.

Usage

h2o.ddply(.data, .variables, .fun = NULL, ..., .progress = "none")

Arguments

.data
An H2OParsedData object to be processed.
.variables
Variables to split .data by, either the indices or names of a set of columns.
.fun
Function to apply to each subset grouping. Must have been pushed to H2O using h2o.addFunction.
...
Additional arguments passed on to .fun. (Currently unimplemented).
.progress
Name of the progress bar to use. (Currently unimplemented).

Value

  • An H2OParsedData object containing the results from the split/apply operation, arranged row-by-row.

Details

This is an extension of the plyr library's ddply function to datasets loaded into H2O.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

See Also

h2o.addFunction

Examples

Run this code
library(h2o)
localH2O = h2o.init()

# Import iris dataset to H2O
irisPath = system.file("extdata", "iris_wheader.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")

# Add function taking mean of sepal_len column
fun = function(df) { sum(df[,1], na.rm = T)/nrow(df) }
h2o.addFunction(localH2O, fun)

# Apply function to groups by class of flower
# uses h2o's ddply, since iris.hex is an H2OParsedData object
res = h2o.ddply(iris.hex, "class", fun)
head(res)

Run the code above in your browser using DataLab