Learn R Programming

disk.frame (version 0.3.7)

hard_arrange: Perform a hard arrange

Description

A hard_arrange is a sort by that also reorganizes the chunks to ensure that every unique grouping of `by`` is in the same chunk. Or in other words, every row that share the same `by` value will end up in the same chunk.

Usage

hard_arrange(df, ..., add = FALSE, .drop = FALSE)

# S3 method for data.frame hard_arrange(df, ...)

# S3 method for disk.frame hard_arrange( df, ..., outdir = tempfile("tmp_disk_frame_hard_arrange"), nchunks = disk.frame::nchunks(df), overwrite = TRUE )

Arguments

df

a disk.frame

...

grouping variables

add

same as dplyr::arrange

.drop

same as dplyr::arrange

outdir

the output directory

nchunks

The number of chunks in the output. Defaults = nchunks.disk.frame(df)

overwrite

overwrite the out put directory

Examples

Run this code
# NOT RUN {
iris.df = as.disk.frame(iris, nchunks = 2)

# arrange iris.df by specifies and ensure rows with the same specifies are in the same chunk
iris_hard.df = hard_arrange(iris.df, Species)

get_chunk(iris_hard.df, 1)
get_chunk(iris_hard.df, 2)

# clean up cars.df
delete(iris.df)
delete(iris_hard.df)
# }

Run the code above in your browser using DataLab