map: Apply the same function to all chunks

Description

Apply the same function to all chunks

`imap.disk.frame` accepts a two argument function where the first argument is a data.frame and the second is the chunk ID

`lazy` is convenience function to apply `.f` to every chunk

`delayed` is an alias for lazy and is consistent with the naming in Dask and Dagger.jl

Usage

map(.x, .f, ...)
# S3 method for disk.frame
map(.x, .f, ..., outdir = NULL, keep = NULL,
  chunks = nchunks(.x), compress = 50, lazy = TRUE,
  overwrite = FALSE, vars_and_pkgs = future::getGlobalsAndPackages(.f,
  envir = parent.frame()), .progress = TRUE)
map_dfr(.x, .f, ..., .id = NULL)
# S3 method for default
map_dfr(.x, .f, ..., .id = NULL)
# S3 method for disk.frame
map_dfr(.x, .f, ..., .id = NULL, use.names = fill,
  fill = FALSE, idcol = NULL)
imap(.x, .f, ...)
# S3 method for default
imap(.x, .f, ...)
# S3 method for disk.frame
imap(.x, .f, outdir = NULL, keep = NULL,
  chunks = nchunks(.x), compress = 50, lazy = TRUE,
  overwrite = FALSE, ...)
# S3 method for disk.frame
imap_dfr(.x, .f, ..., .id = NULL,
  use.names = fill, fill = FALSE, idcol = NULL)
imap_dfr(.x, .f, ..., .id = NULL)
# S3 method for default
imap_dfr(.x, .f, ..., .id = NULL)
lazy(.x, .f, ...)
# S3 method for disk.frame
lazy(.x, .f, ...)
delayed(.x, .f, ...)
# S3 method for disk.frame
delayed(.x, .f, ...)
chunk_lapply(...)

Arguments

a disk.frame

a function to apply to each of the chunks

...

for compatibility with `purrr::map`

outdir

the output directory

keep

the columns to keep from the input

chunks

The number of chunks to output

compress

0-100 fst compression ratio

lazy

if TRUE then do this lazily

overwrite

if TRUE removes any existing chunks in the data

vars_and_pkgs

variables and packages to send to a background session. This is typically automatically detected

.progress

A logical, for whether or not to print a progress bar for multiprocess, multisession, and multicore plans. From furrr

.id

not used

use.names

for map_dfr's call to data.table::rbindlist. See data.table::rbindlist

fill

for map_dfr's call to data.table::rbindlist. See data.table::rbindlist

idcol

for map_dfr's call to data.table::rbindlist. See data.table::rbindlist

Examples

Run this code

# NOT RUN {
cars.df = as.disk.frame(cars)

# return the first row of each chunk lazily
# 
cars2 = map(cars.df, function(chunk) {
 chunk[,1]
})

collect(cars2)

# same as above but using purrr 
cars2 = map(cars.df, ~.x[1,])

collect(cars2)

# return the first row of each chunk eagerly as list
map(cars.df, ~.x[1,], lazy = FALSE)

# return the first row of each chunk eagerly as data.table/data.frame by row-binding
map_dfr(cars.df, ~.x[1,])

# lazy and delayed are just an aliases for map(..., lazy = TRUE)
collect(lazy(cars.df, ~.x[1,]))
collect(delayed(cars.df, ~.x[1,]))

# clean up cars.df
delete(cars.df)
cars.df = as.disk.frame(cars)

# .x is the chunk and .y is the ID as an integer

# lazy = TRUE support is not available at the moment
imap(cars.df, ~.x[, id := .y], lazy = FALSE)

imap_dfr(cars.df, ~.x[, id := .y])

# clean up cars.df
delete(cars.df)
# }

Run the code above in your browser using DataLab