WARNING: Please note that this sets up a stateless set of cluster nodes,
which means that clusterEvalQ(cl, { a <- 3.14 })
will have no effect.
Consider this a first beta version and use it with great care,
particularly because of the stateless nature of the cluster.
For now, I recommend to manually validate that you can get identical
results using this cluster type with what you get from using the
classical parallel::makeCluster()
cluster type.
makeClusterFuture(specs = nbrOfWorkers(), ...)
Returns a parallel
cluster
object of class FutureCluster
.
Ignored.
If specified, the value should equal nbrOfWorkers()
(default).
A missing value corresponds to specifying nbrOfWorkers()
.
This argument exists only to support
parallel::makeCluster(NA, type = future::FUTURE)
.
Named arguments passed to future()
.
Traditionally, a cluster nodes has a one-to-one mapping to a cluster
worker process. For example, cl <- makeCluster(2, type = "PSOCK")
launches two parallel worker processes in the background, where
cluster node cl[[1]]
maps to worker #1 and node cl[[2]]
to
worker #2, and that never changes through the lifespan of these
workers. This one-to-one mapping allows for deterministic
configuration of workers. For examples, some code may assign globals
with values specific to each worker, e.g.
clusterEvalQ(cl[1], { a <- 3.14 })
and
clusterEvalQ(cl[2], { a <- 2.71 })
.
In contrast, there is no one-to-one mapping between cluster nodes and the parallel workers when using a future cluster. This is because we cannot make assumptions on where are parallel task will be processed. Where a parallel task is processes is up to the future backend to decide - some backends do this deterministically, whereas others other resolves task at the first available worker. Also, the worker processes might be transient for some future backends, i.e. the only exist for the life-span of the parallel task and then terminates.
Because of this, one must not rely in node-specific behaviors,
because that concept does not make sense with a future cluster.
To protect against this, any attempt to address a subset of future
cluster nodes, results in an error, e.g. clusterEvalQ(cl[1], ...)
,
clusterEvalQ(cl[1:2], ...)
, and clusterEvalQ(cl[2:1], ...)
in
the above example will all give an error.
Exceptions to the latter limitation are clusterSetRNGStream()
and clusterExport()
, which can be safely used with future clusters.
See below for more details.
If clusterEvalQ()
is called, the call is ignored, and a warning
is produced.
parallel::clusterSetRNGStream()
distributes "L'Ecuyer-CMRG" RNG
streams to the cluster nodes, which record them such that the next
round of futures will use them. When used, the RNG state after the
futures are resolved are recorded accordingly, such that the next
round again of future will use those, and so on. This strategy
makes sure clusterSetRNGStream()
has the expected effect although
futures are stateless.
parallel::clusterExport()
assign values to the cluster nodes.
Specifically, these values are recorded and are used as globals
for all futures created there on.
if (FALSE) { # (getRversion() >= "4.4.0")
plan(multisession)
cl <- makeClusterFuture()
parallel::clusterSetRNGStream(cl)
y <- parallel::parLapply(cl, 11:13, function(x) {
message("Process ID: ", Sys.getpid())
mean(rnorm(n = x))
})
str(y)
plan(sequential)
}
Run the code above in your browser using DataLab