Learn R Programming

disk.frame (version 0.5.0)

shard: Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame

Description

Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame

`distribute` is an alias for `shard`

Usage

shard(
  df,
  shardby,
  outdir = tempfile(fileext = ".df"),
  ...,
  nchunks = recommend_nchunks(df),
  overwrite = FALSE,
  shardby_function = "hash",
  sort_splits = NULL,
  desc_vars = NULL
)

distribute(...)

Arguments

df

A data.frame/data.table or disk.frame. If disk.frame, then rechunk(df, ...) is run

shardby

The column(s) to shard the data by.

outdir

The output directory of the disk.frame

...

not used

nchunks

The number of chunks

overwrite

If TRUE then the chunks are overwritten

shardby_function

splitting of chunks: "hash" for hash function or "sort" for semi-sorted chunks

sort_splits

If shardby_function is "sort", the split values for sharding

desc_vars

for the "sort" shardby function, the variables to sort descending.

Examples

Run this code
# NOT RUN {
# shard the cars data.frame by speed so that rows with the same speed are in the same chunk
iris.df = shard(iris, "Species")

# clean up cars.df
delete(iris.df)
# }

Run the code above in your browser using DataLab