Learn R Programming

SeqArray (version 1.10.1)

seqParallel: Apply Functions in Parallel

Description

Applies a user-defined function in parallel.

Usage

seqParallel(cl=getOption("seqarray.parallel", FALSE), gdsfile, FUN, split=c("by.variant", "by.sample", "none"), .combine="unlist", .selection.flag=FALSE, ...)

Arguments

cl
NULL or FALSE: serial processing; TRUE: parallel processing with the maximum number of cores minor one; a numeric value: the number of cores to be used; a cluster object for parallel processing, created by the functions in the package parallel, like makeCluster. See details
gdsfile
FUN
the function to be applied, should be like FUN(gdsfile, ...)
split
split the dataset by variant or sample according to multiple processes, or "none" for no split
.combine
define a fucntion for combining results from different processes; by default, "unlist" is used, to produce a vector which contains all the atomic components; "list", return a list of results created by processes; "none", no return; or a function, like "+".
.selection.flag
TRUE -- passes a logical vector of selection to the second argument of FUN(gdsfile, selection, ...)
...
optional arguments to FUN

Value

A vector or list of values.

Details

When cl is TRUE or a numeric value, forking techniques are used to create a new child process as a copy of the current R process, see ?parallel::mcfork. However, forking is not available on Windows, so serial processing is used instead. In order to use multiple processes on Windows, users have to create a cluster object via makeCluster.

It is strongly suggested to use seqParallel together with seqParallelSetup. seqParallelSetup could work around the problem of forking on Windows.

See Also

seqSetFilter, seqGetData, seqApply, seqParallelSetup

Examples

Run this code
library(parallel)

# choose an appropriate cluster size or number of cores
seqParallelSetup(2)


# the GDS file
(gds.fn <- seqExampleFileName("gds"))

# display
(gdsfile <- seqOpen(gds.fn))

# the uniprocessor version
afreq1 <- seqParallel(, gdsfile, FUN = function(f) {
        seqApply(f, "genotype", as.is="double",
            FUN=function(x) mean(x==0, na.rm=TRUE))
    }, split = "by.variant")

length(afreq1)
summary(afreq1)


# run in parallel
afreq2 <- seqParallel(, gdsfile, FUN = function(f) {
        seqApply(f, "genotype", as.is="double",
            FUN=function(x) mean(x==0, na.rm=TRUE))
    }, split = "by.variant")

length(afreq2)
summary(afreq2)


# check
all(afreq1 == afreq2)


################################################################
# check -- variant splits

seqParallel(, gdsfile, FUN = function(f) {
        v <- seqGetFilter(f)
        sum(v$variant.sel)
    }, split = "by.variant")
# [1] 674 674


################################################################

# close the GDS file
seqClose(gdsfile)


seqParallelSetup(FALSE)

Run the code above in your browser using DataLab