Learn R Programming

disk.frame (version 0.1.0)

hard_group_by: Perform a hard group

Description

A hard_group_by is a group by that also reorganizes the chunks to ensure that every unique grouping of `by`` is in the same chunk. Or in other words, every row that share the same `by` value will end up in the same chunk.

Usage

hard_group_by(df, ..., add = FALSE, .drop = FALSE)

# S3 method for data.frame hard_group_by(df, ..., add = FALSE, .drop = FALSE)

# S3 method for disk.frame hard_group_by(df, ..., outdir = tempfile("tmp_disk_frame_hard_group_by"), nchunks = disk.frame::nchunks(df), overwrite = TRUE)

Arguments

df

a disk.frame

...

grouping variables

add

same as dplyr::group_by

.drop

same as dplyr::group_by

outdir

the output directory

nchunks

The number of chunks in the output. Defaults = nchunks.disk.frame(df)

overwrite

overwrite the out put directory

Examples

Run this code
# NOT RUN {
iris.df = as.disk.frame(iris, nchunks = 2)

# group_by iris.df by specifies and ensure rows with the same specifies are in the same chunk
iris_hard.df = hard_group_by(iris.df, Species)

get_chunk(iris_hard.df, 1)
get_chunk(iris_hard.df, 2)

# clean up cars.df
delete(iris.df)
delete(iris_hard.df)
# }

Run the code above in your browser using DataLab