These are the possible values of the
parallelism
argument to make()
.
parallelism_choices(distributed_only = FALSE)
logical, whether to return only
the distributed backend types, such as Makefile
and
parLapply
Character vector listing the types of parallel computing supported.
Run make(..., parallelism = x, jobs = n)
for any of
the following values of x
to distribute targets over parallel
units of execution.
launches multiple processes in a single R session
using parallel::parLapply()
.
This is single-node, (potentially) multicore computing.
It requires more overhead than the 'mclapply'
option,
but it works on Windows. If jobs
is 1
in
make()
, then no 'cluster' is created and
no parallelism is used.
uses multiple processes in a single R session.
This is single-node, (potentially) multicore computing.
Does not work on Windows for jobs > 1
because mclapply()
is based on forking.
opens up a whole trove of parallel backends
powered by the future
and future.batchtools
packages. First, set the parallel backend globally using
future::plan()
.
Then, apply the backend to your drake_plan
using make(..., parallelism = "future_lapply", jobs = ...)
.
But be warned: the environment for each target needs to be set up
from scratch, so this backend type is higher overhead than either
mclapply
or parLapply
.
Also, the jobs
argument only applies to the imports.
To set the max number of jobs, set the workers
argument where it exists. For example, call
future::plan(multisession(workers = 4))
,
then call make(your_plan, parallelism = "future_lapply")
.
You might also try options(mc.cores = jobs),
or see future::.options
for environment variables that set the max number of jobs.
Just like "future_lapply"
parallelism, except that
Rather than use staged parallelism, each target begins as soon as their dependencies are ready and a worker is available. This greatly increases parallel efficiency.
The jobs
argument to make()
works as intended: in
make(jobs = 4, parallelism = "future")
, 4 workers will run
simultaneously, so at most 4 jobs will run simultaneously.
The "future"
backend is experimental, so it is more likely
to have bugs than its counterparts.
uses multiple R sessions
by creating and running a Makefile.
For distributed computing on a cluster or supercomputer,
try make(..., parallelism = 'Makefile',
prepend = 'SHELL=./shell.sh')
.
You need an auxiliary shell.sh
file for this,
and shell_file()
writes an example.
Here, Makefile-level parallelism is only used for
targets in your workflow plan
data frame, not imports. To process imported objects and files,
drake selects the best parallel
backend for your system and uses
the number of jobs you give to the jobs
argument to make()
.
To use at most 2 jobs for imports and at most 4 jobs
for targets, run
make(..., parallelism = 'Makefile', jobs = 2, args = '--jobs=4')
Caution: the Makefile generated by
make(..., parallelism = 'Makefile')
is NOT standalone. DO NOT run it outside of
make()
or make()
.
Also, Windows users will need to download and install Rtools.
# NOT RUN {
# See all the parallel computing options.
parallelism_choices()
# See just the distributed computing options.
parallelism_choices(distributed_only = TRUE)
# }
Run the code above in your browser using DataLab