future.apply (version 0.1.0)

future_lapply: Apply a Function over a List or Vector via Futures

Description

Apply a Function over a List or Vector via Futures

Usage

future_lapply(X, FUN, ..., future.globals = TRUE, future.packages = NULL,
  future.seed = FALSE, future.lazy = FALSE, future.scheduling = 1)

future_sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

Arguments

X

A vector-like object to iterate over.

FUN

A function taking at least one argument.

...

(optional) Additional arguments passed to FUN(). For future_*apply() functions, any future.* arguments part of … are passed on to future_lapply(), if that is used internally.

future.globals

A logical, a character vector, or a named list for controlling how globals are handled. For details, see below section.

future.packages

(optional) a character vector specifying packages to be attached in the R environment evaluating the future.

future.seed

A logical or an integer (of length one or seven), or a list of length(X) with pre-generated random seeds. For details, see below section.

future.lazy

Specifies whether the futures should be resolved lazily or eagerly (default).

future.scheduling

Average number of futures ("chunks") per worker. If 0.0, then a single future is used to process all elements of X. If 1.0 or TRUE, then one future per worker is used. If 2.0, then each worker will process two futures (if there are enough elements in X). If Inf or FALSE, then one future per element of X is used.

simplify, USE.NAMES

See base::sapply() for details.

Value

For future_lapply(), a list with same length and names as X.

For future_sapply(), a vector with same length and names as X. See base::sapply() for details.

Global variables

Argument future.globals may be used to control how globals should be handled similarly how the globals argument is used with future(). Since all function calls use the same set of globals, this function can do any gathering of globals upfront (once), which is more efficient than if it would be done for each future independently. If TRUE, NULL or not is specified (default), then globals are automatically identified and gathered. If a character vector of names is specified, then those globals are gathered. If a named list, then those globals are used as is. In all cases, FUN and any ... arguments are automatically passed as globals to each future created as they are always needed.

Reproducible random number generation (RNG)

Unless future.seed = FALSE, this function guarantees to generate the exact same sequence of random numbers given the same initial seed / RNG state - this regardless of type of futures and scheduling ("chunking") strategy.

RNG reproducibility is achieved by pregenerating the random seeds for all iterations (over X) by using L'Ecuyer-CMRG RNG streams. In each iteration, these seeds are set before calling FUN(X[[ii]], ...). Note, for large length(X) this may introduce a large overhead. As input (future.seed), a fixed seed (integer) may be given, either as a full L'Ecuyer-CMRG RNG seed (vector of 1+6 integers) or as a seed generating such a full L'Ecuyer-CMRG seed. If future.seed = TRUE, then .Random.seed is returned if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly. If future.seed = NA, a L'Ecuyer-CMRG RNG seed is randomly created. If none of the function calls FUN(X[[ii]], ...) uses random number generation, then future.seed = FALSE may be used.

In addition to the above, it is possible to specify a pre-generated sequence of RNG seeds as a list such that length(future.seed) == length(X) and where each element is an integer seed that can be assigned to .Random.seed. Use this alternative with caution. Note that as.list(seq_along(X)) is not a valid set of such .Random.seed values.

In all cases but future.seed = FALSE, the RNG state of the calling R processes after this function returns is guaranteed to be "forwarded one step" from the RNG state that was before the call and in the same way regardless of future.seed, future.scheduling and future strategy used. This is done in order to guarantee that an R script calling future_lapply() multiple times should be numerically reproducible given the same initial seed.

Examples

Run this code
# NOT RUN {
## Regardless of the future plan, the number of workers, and
## where they are, the random numbers produced are identical

plan(multiprocess)
y1 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y1)

plan(sequential)
y2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y2)

stopifnot(all.equal(y1, y2))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab