bpvec: Parallel, vectorized evaluation

Description

bpvec applies FUN to subsets of X. Any type of object X is allowed, provided length, and [ are defined on X. FUN is a function such that length(FUN(X)) == length(X). The objects returned by FUN are concatenated by AGGREGATE (c() by default). The return value is FUN(X).

Usage

bpvec(X, FUN, ..., AGGREGATE=c, BPREDO=list(), BPPARAM=bpparam())

Arguments

Any object for which methods length and [ are implemented.

FUN

A function to be applied to subsets of X. The relationship between X and FUN(X) is 1:1, so that length(FUN(X, ...)) == length(X). The return value of separate calls to FUN are concatenated with AGGREGATE.

...

Additional arguments for FUN.

AGGREGATE

A function taking any number of arguments ... called to reduce results (elements of the ... argument of AGGREGATE from parallel jobs. The default, c, concatenates objects and is appropriate for vectors; rbind might be appropriate for data frames.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to BiocParallel functions.

BPREDO

A list of output from bpvec with one or more failed elements. When a list is given in BPREDO, bpok is used to identify errors, tasks are rerun and inserted into the original results.

Value

The result should be identical to FUN(X, ...) (assuming that AGGREGATE is set appropriately).When evaluation of individual elements of X results in an error, the result is a list with the same geometry (i.e., lengths()) as the split applied to X to create chunks for parallel evaluation; one or more elements of the list contain a bperror element, indicting that the vectorized calculation failed for at least one of the index values in that chunk.An error is also signaled when FUN(X) does not return an object of the same length as X; this condition is only detected when the number of elements in X is greater than the number of workers.

Details

This method creates a vector of indices for X that divide the elements as evenly as possible given the number of bpworkers() and bptasks() of BPPARAM. Indices and data are passed to bplapply for parallel evaluation.

The distinction between bpvec and bplapply is that bplapply applies FUN to each element of X separately whereas bpvec assumes the function is vectorized, e.g., c(FUN(x[1]), FUN(x[2])) is equivalent to FUN(x[1:2]). This approach can be more efficient than bplapply but requires the assumption that FUN takes a vector input and creates a vector output of the same length as the input which does not depend on partitioning of the vector. This behavior is consistent with parallel:::pvec and the ?pvec man page should be consulted for further details.

Examples

Run this code

showMethods("bpvec")

## ten tasks (1:10), called with as many back-end elements are specified
## by BPPARAM.  Compare with bplapply
fun <- function(v) {
    message("working")
    sqrt(v)
}
system.time(result <- bpvec(1:10, fun)) 
result

## invalid FUN -- length(class(X)) is not equal to length(X)
bptry(bpvec(1:2, class, BPPARAM=SerialParam()))

Run the code above in your browser using DataLab