Learn R Programming

mcprogress (version 0.1.1)

pmclapply: mclapply with progress bar

Description

pmclapply() adds a progress bar to mclapply() in Rstudio or Linux environments using output to the console. It is designed to add very little overhead.

Usage

pmclapply(
  X,
  FUN,
  ...,
  progress = TRUE,
  spinner = FALSE,
  title = "",
  eta = TRUE,
  mc.preschedule = TRUE,
  mc.set.seed = TRUE,
  mc.silent = FALSE,
  mc.cores = getOption("mc.cores", 2L),
  mc.cleanup = TRUE,
  mc.allow.recursive = TRUE,
  affinity.list = NULL
)

Value

A list of the same length as X and named by X.

Arguments

X

a vector (atomic or list) or an expressions vector. Other objects (including classed objects) will be coerced by as.list().

FUN

the function to be applied via mclapply() to each element of X in parallel.

...

Optional arguments passed to FUN.

progress

Logical whether to show the progress bar.

spinner

Logical whether to show a spinner which moves each time a parallel process is completed. More useful if the length of time for each process to complete is variable.

title

Title for the progress bar.

eta

Logical whether to show estimated time to completion.

mc.preschedule, mc.set.seed, mc.silent, mc.cleanup, mc.allow.recursive, affinity.list

See mclapply().

mc.cores

The number of cores to use, i.e. at most how many child processes will be run simultaneously. The option is initialized from environment variable MC_CORES if set. Must be at least one, and parallelization requires at least two cores.

Author

Myles Lewis

Details

This function can be used in an identical manner to mclapply(). It is ideal for use if the length of X is comparably > cores. As processes are spawned in a block and most code for each process completes at roughly the same time, processes move along in blocks as determined by mc.cores. To track progress, pmclapply() only tracks the nth process, where n=mc.cores. For example, with 4 cores, pmclapply() reports progress when the 4th, 8th, 12th, 16th etc process has completed. If the length of X is very large (e.g. in the 1000s), then the progress bar will only update for each 1% of progress in order to reduce overhead.

However, in some scenarios the length of X is comparable to the number of cores and each process may take a long time. For example, machine learning applied to each of 8 cross-validation folds on an 8-core machine will open 8 processes from the outset. Each process will often complete at roughly the same time. In this case pmclapply() is much less informative as it only shows completion at the end of 1 round of processes, so it will go from 0% straight to 100%. For this scenario, we recommend users use mcProgressBar() which allows more fine-grained reporting of subprogress from within a block of parallel processes.

ETA is approximate. As part of minimising overhead, it is only updated with each change in progress (i.e. each time a block of processes completes). It is not updated by interrupt.

See Also

mclapply() mcProgressBar()

Examples

Run this code
if (Sys.info()["sysname"] != "Windows") {

res <- pmclapply(letters[1:20], function(i) {
                 Sys.sleep(0.2 + runif(1) * 0.1)
                 setNames(rnorm(5), paste0(i, 1:5))
                 }, mc.cores = 2, title = "Working")

}

Run the code above in your browser using DataLab