Learn R Programming

shard (version 0.1.0)

stream_reduce: Stream over row-groups/datasets and reduce

Description

Applies f() to each partition (row-group) and combines results with combine() into a single accumulator. This keeps peak memory bounded by the largest single partition (plus your accumulator).

Usage

stream_reduce(x, f, init, combine, ...)

# S3 method for shard_row_groups stream_reduce(x, f, init, combine, ...)

# S3 method for shard_dataset stream_reduce(x, f, init, combine, ...)

Value

The final accumulator value after processing all partitions.

Arguments

x

A shard_row_groups or shard_dataset handle.

f

Function (chunk, ...) -> value producing a per-partition value.

init

Initial accumulator value.

combine

Function (acc, value) -> acc to update the accumulator.

...

Passed to f().

Examples

Run this code
# \donttest{
s <- schema(x = float64())
sink <- table_sink(s, mode = "row_groups")
table_write(sink, 1L, data.frame(x = rnorm(5)))
rg <- table_finalize(sink)
total <- stream_reduce(rg, f = nrow, init = 0L, combine = `+`)
# }

Run the code above in your browser using DataLab