datadr (version 0.8.4)

localDiskControl: Specify Control Parameters for MapReduce on a Local Disk Connection

Description

Specify control parameters for a MapReduce on a local disk connection. Currently the parameters include:

Usage

localDiskControl(cluster = NULL, map_buff_size_bytes = 10485760,
  reduce_buff_size_bytes = 10485760, map_temp_buff_size_bytes = 10485760)

Arguments

cluster
a "cluster" object obtained from makeCluster to allow for parallel processing
map_buff_size_bytes
determines how much data should be sent to each map task
reduce_buff_size_bytes
determines how much data should be sent to each reduce task
map_temp_buff_size_bytes
determines the size of chunks written to disk in between the map and reduce

Examples

Run this code
# create a 2-node cluster that can be used to process in parallel
cl <- parallel::makeCluster(2)
# create a local disk control object that specifies to use this cluster
# these operations run in parallel
control <- localDiskControl(cluster = cl)
# note that setting options(defaultLocalDiskControl = control)
# will cause this to be used by default in all local disk operations

# convert in-memory ddf to local-disk ddf
ldPath <- file.path(tempdir(), "by_species")
ldConn <- localDiskConn(ldPath, autoYes = TRUE)
bySpeciesLD <- convert(divide(iris, by = "Species"), ldConn)

# update attributes using parallel cluster
updateAttributes(bySpeciesLD, control = control)

# remove temporary directories
unlink(ldPath, recursive = TRUE)

# shut down the cluster
parallel::stopCluster(cl)

Run the code above in your browser using DataLab