apply and lapply: Parallel Apply and Lapply Functions

Description

The functions are parallel versions of apply and lapply functions.

Usage

pbdApply(X, MARGIN, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
         rank.source = .pbd_env$SPMD.CT$rank.root,
         comm = .pbd_env$SPMD.CT$comm,
         barrier = TRUE)
pbdLapply(X, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)
pbdSapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE,
          pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)

Arguments

a matrix or array in pbdApply() or a list in pbdLapply() and pbdSapply().

MARGIN

MARGIN as in the apply().

FUN

as in the apply().

...

optional arguments to FUN.

simplify

as in the sapply().

USE.NAMES

as in the sapply().

pbd.mode

mode of distributed data X.

rank.source

a rank of source where X broadcast from.

comm

a communicator number.

bcast

if bcast to all ranks.

barrier

if barrier for all ranks.

Value

A list or matrix will be returned.

Details

All functions are majorly called in manager/workers mode (pbd.model = "mw"), and just work the same as their serial version.

If pbd.mode = "mw", the X in rank.source (master) will be redistributed to processors (workers), then apply FUN on the new data, and results are gathered to rank.source. ``In SPMD, master is one of workers.'' ... is also scatter() from rank.source.

If pbd.mode = "spmd", the same copy of X is supposed to exist in all processors, and original apply(), lapply(), or sapply() is operated on part of X. An allgather() or gather() call is required to aggregate results manually.

If pbd.mode = "dist", the different X is supposed to exists in all processors, i.e. `distinct or distributed' X, and original apply(), lapply(), or sapply() is operated on the all X. An allgather() or gather() call is required to aggregate results manually.

In SPMD, it is better to split data into pieces, and X is a local matrix in all processors. Originally, apply() should be sufficient in this case.

References

Programming with Big Data in R Website: http://r-pbd.org/

Examples

Run this code

# NOT RUN {
<!-- %\dontrun{ -->
# }
# NOT RUN {
### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r

spmd.code <- "
### Initial.
suppressMessages(library(pbdMPI, quietly = TRUE))
init()
.comm.size <- comm.size()
.comm.rank <- comm.rank()

### Example for pbdApply.
N <- 100
x <- matrix((1:N) + N * .comm.rank, ncol = 10)
y <- pbdApply(x, 1, sum, pbd.mode = \"mw\")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = \"spmd\")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = \"dist\")
comm.print(y)


### Example for pbdApply for 3D array.
N <- 60
x <- array((1:N) + N * .comm.rank, c(3, 4, 5))
dimnames(x) <- list(lat = paste(\"lat\", 1:3, sep = \"\"),
                    lon = paste(\"lon\", 1:4, sep = \"\"),
                    time = paste(\"time\", 1:5, sep = \"\"))
comm.print(x[,, 1:2])

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"mw\")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"spmd\")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"dist\")
comm.print(y)


### Example for pbdLapply.
N <- 100
x <- split((1:N) + N * .comm.rank, rep(1:10, each = 10))
y <- pbdLapply(x, sum, pbd.mode = \"mw\")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = \"spmd\")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = \"dist\")
comm.print(unlist(y))

### Finish.
finalize()
"
pbdMPI::execmpi(spmd.code, nranks = 2L)
# }
# NOT RUN {
<!-- %} -->
# }

Run the code above in your browser using DataLab