Programming with Big Data -- Interface to MPI

An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data ('SPMD') parallel programming style, which is intended for batch parallel execution.



  • License: License
  • Download: Download
  • Status: Build Status Appveyor Build status
  • Author: See section below.

With few exceptions (ff, bigalgebra, etc.), R does computations in memory. When data becomes too large to handle in the memory of a single node, or when more processors than those offered in commodity hardware (~16) are needed for a job, a typical strategy is to add more nodes. MPI, or the "Message Passing Interface", is the standard for managing multi-node communication. pbdMPI is a package that greatly simplifies the use of MPI from R.

In pbdMPI, we make extensive use of R's S4 system to simplify the interface significantly. Instead of needing to specify the type (e.g., integer or double) of the data via function name (as in C implementations) or in an argument (as in Rmpi), you need only call the generic function on your data and we will always "do the right thing".

In pbdMPI, we write programs in the "Single Program/Multiple Data" or SPMD style. Contrary to the way much of the R world is aquainted with parallelism, there is no "master" or "manager". Each process (MPI rank) gets runs the same copy of the program as every other process, but operates on its own data. This is arguably one of the simplest extensions of serial to massively parallel programming, and has been the standard way of doing things in the HPC community for over 20 years.


If you are comfortable with MPI concepts, you should find pbdMPI very agreeable and simple to use. Below is a basic "hello world" program:

# load the package
suppressMessages(library(pbdMPI, quietly = TRUE))

# initialize the MPI communicators

# Hello world
message <- paste("Hello from rank", comm.rank(), "of", comm.size())
comm.print(message, all.rank=TRUE, quiet=TRUE)

# shut down the communicators and exit

Save this as, say, mpi_hello_world.r and run it via:

mpirun -np 4 Rscript mpi_hello_world.r

The function comm.print() is a "sugar" function custom to pbdMPI that makes it simple to print in a distributed environment. The argument all.rank=TRUE specifies that all MPI ranks should print, and the quiet=TRUE argument tells each rank not to "announce" itself when it does its printing.

Numerous other examples can be found in both the pbdMPI vignette as well as the pbdDEMO package and its corresponding vignette.


pbdMPI requires

  • R version 3.0.0 or higher
  • A system installation of MPI:
    • SUN HPC 8.2.1 (OpenMPI) for Solaris.
    • OpenMPI for Linux.
    • OpenMPI for Mac OS X.
    • MS-MPI for Windows.

The package can be installed from the CRAN via the usual install.packages("pbdMPI"), or via the devtools package:


For additional installation information, see:

  • see "INSTALL" for Solaris, Linux and Mac OS.
  • see "INSTALL.win.*" for Windows.

More information about pbdMPI, including installation troubleshooting, can be found in:

  1. pbdMPI vignette at 'pbdMPI/inst/doc/pbdMPI-guide.pdf'.
  2. 'http://r-pbd.org/'.


pbdMPI is authored and maintained by the pbdR core team:

  • Wei-Chen Chen
  • George Ostrouchov
  • Drew Schmidt
  • Pragneshkumar Patel

With additional contributions from:

  • Hao Yu
  • Christian Heckendorf
  • Brian Ripley (Windows HPC Pack 2012)
  • The R Core team (some functions are modified from the base packages)

Functions in pbdMPI

Name Description
pbdMPI-package Programming with Big Data -- Interface to MPI
communicator Communicator Functions
get job id Divide Job ID by Ranks
global print and cat Global Print and Cat Functions
Package Tools Functions for Get/Print MPI_COMM Pointer (Address)
Comm Internal Functions All Comm Internal Functions
gather-method A Rank Gathers Objects from Every Rank
reduce-method A Rank Receive a Reduction of Objects from Every Rank
send-method A Rank Send (blocking) an Object to the Other Rank
irecv-method A Rank Receives (Nonblocking) an Object from the Other Rank
wait Wait Functions
Utility execmpi Execute MPI code in system
scatter-method A Rank Scatter Objects to Every Rank
recv-method A Rank Receives (Blocking) an Object from the Other Rank
probe Probe Functions
global range, max, and min Global Range, Max, and Min Functions
global sort Global Quick Sort for Distributed Vectors or Matrices
sourcetag Functions to Obtain source and tag
info Info Functions
global base Global Base Functions
global distance function Global Distance for Distributed Matrices
Task Pull Functions for Task Pull Parallelism
Get Configures Used at Compiling Time Functions to Get MPI and/or pbdMPI Configures Used at Compiling Time
is.comm.null Check if a MPI_COMM_NULL
global writing Global Writing Functions
global Rprof A Rprof Function for SPMD Routines
global which, which.max, and which.min Global Which Functions
apply and lapply Parallel Apply and Lapply Functions
allreduce-method All Ranks Receive a Reduction of Objects from Every Rank
bcast-method A Rank Broadcast an Object to Every Rank
alltoall All to All
sendrecv.replace-method Send and Receive an Object to and from Other Ranks
global balanc Global Balance Functions
global reading Global Reading Functions
global all pairs Global All Pairs
global any and all Global Any and All Functions
SPMD Control Functions Sets of controls in pbdMPI.
SPMD Internal Functions All SPMD Internal Functions
Set global pbd options Set Global pbdR Options
allgather-method All Ranks Gather Objects from Every Rank
isend-method A Rank Send (Nonblocking) an Object to the Other Rank
sendrecv-method Send and Receive an Object to and from Other Ranks
seed for RNG Seed Functions for Random Number Generators
global as.gbd Global As GBD Function
global match.arg Global Argument Matching
global pairwise Global Pairwise Evaluations
global stop and warning Global Stop and Warning Functions
global timer A Timing Function for SPMD Routines
SPMD Control Sets of controls in pbdMPI.
MPI array pointers Set or Get MPI Array Pointers in R
