powered by
Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame
`distribute` is an alias for `shard`
shard( df, shardby, outdir = tempfile(fileext = ".df"), ..., nchunks = recommend_nchunks(df), overwrite = FALSE, shardby_function = "hash", sort_splits = NULL, desc_vars = NULL )distribute(...)
distribute(...)
A data.frame/data.table or disk.frame. If disk.frame, then rechunk(df, ...) is run
The column(s) to shard the data by.
The output directory of the disk.frame
not used
The number of chunks
If TRUE then the chunks are overwritten
splitting of chunks: "hash" for hash function or "sort" for semi-sorted chunks
If shardby_function is "sort", the split values for sharding
for the "sort" shardby function, the variables to sort descending.
# NOT RUN { # shard the cars data.frame by speed so that rows with the same speed are in the same chunk iris.df = shard(iris, "Species") # clean up cars.df delete(iris.df) # }
Run the code above in your browser using DataLab