parallelism_choices: List the types of supported parallel computing in drake.

Description

These are the possible values of the parallelism argument to make().

Usage

parallelism_choices(distributed_only = FALSE)

Arguments

distributed_only

logical, whether to return only the distributed backend types, such as Makefile and parLapply

Value

Character vector listing the types of parallel computing supported.

Details

Run make(..., parallelism = x, jobs = n) for any of the following values of x to distribute targets over parallel units of execution.

'parLapply'

launches multiple processes in a single R session using parallel::parLapply(). This is single-node, (potentially) multicore computing. It requires more overhead than the 'mclapply' option, but it works on Windows. If jobs is 1 in make(), then no 'cluster' is created and no parallelism is used.

'mclapply'

uses multiple processes in a single R session. This is single-node, (potentially) multicore computing. Does not work on Windows for jobs > 1 because mclapply() is based on forking.

'future_lapply'

opens up a whole trove of parallel backends powered by the future and future.batchtools packages. First, set the parallel backend globally using future::plan(). Then, apply the backend to your drake_plan using make(..., parallelism = "future_lapply", jobs = ...). But be warned: the environment for each target needs to be set up from scratch, so this backend type is higher overhead than either mclapply or parLapply. Also, the jobs argument only applies to the imports. To set the max number of jobs, set the workers argument where it exists. For example, call future::plan(multisession(workers = 4)), then call make(your_plan, parallelism = "future_lapply"). You might also try options(mc.cores = jobs), or see future::.options for environment variables that set the max number of jobs.

'future'

Just like "future_lapply" parallelism, except that

Rather than use staged parallelism, each target begins as soon as their dependencies are ready and a worker is available. This greatly increases parallel efficiency.
The jobs argument to make() works as intended: in make(jobs = 4, parallelism = "future"), 4 workers will run simultaneously, so at most 4 jobs will run simultaneously.
The "future" backend is experimental, so it is more likely to have bugs than its counterparts.

'Makefile'

uses multiple R sessions by creating and running a Makefile. For distributed computing on a cluster or supercomputer, try make(..., parallelism = 'Makefile', prepend = 'SHELL=./shell.sh'). You need an auxiliary shell.sh file for this, and shell_file() writes an example.

Here, Makefile-level parallelism is only used for targets in your workflow plan data frame, not imports. To process imported objects and files, drake selects the best parallel backend for your system and uses the number of jobs you give to the jobs argument to make(). To use at most 2 jobs for imports and at most 4 jobs for targets, run make(..., parallelism = 'Makefile', jobs = 2, args = '--jobs=4')

Caution: the Makefile generated by make(..., parallelism = 'Makefile') is NOT standalone. DO NOT run it outside of make() or make(). Also, Windows users will need to download and install Rtools.

Examples

Run this code

# NOT RUN {
# See all the parallel computing options.
parallelism_choices()
# See just the distributed computing options.
parallelism_choices(distributed_only = TRUE)
# }