datadr (version 0.8.4)

rrDiv: Random Replicate Division

Description

Specify random replicate division parameters for data division

Usage

rrDiv(nrows = NULL, seed = NULL)

Arguments

nrows
number of rows each subset should have
seed
the random seed to use (experimental)

Value

  • a list to be used for the "by" argument to divide

Details

The random replicate division method currently gets the total number of rows of the input data and divides it by nrows to get the number of subsets. Then it randomly assigns each row of the input data to one of the subsets, resulting in subsets with approximately nrows rows. A future implementation will make each subset have exactly nrows rows.

References

  • http://tessera.io
  • http://onlinelibrary.wiley.com/doi/10.1002/sta4.7/full{Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., & Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE.Stat, 1(1), 53-67.

See Also

divide, recombine, condDiv

Examples

Run this code
# divide iris data into random subsets with ~20 records per subset
irisRR <- divide(iris, by = rrDiv(20), update = TRUE)
irisRR
# look at the actual distribution of number of rows per subset
plot(splitRowDistn(irisRR))

Run the code above in your browser using DataLab