Parallel Runs of Reverse Depends
Reverse depends for a given package are queued such that multiple
workers can run the tests in parallel.
Parallel Running [of] Reverse Depends
R packages available via the CRAN mirror network system are of consistently high quality and tend to "Just Work". One of the many reasons for this is a good culture of "do not break other packages" which is controlled for / enacted by the CRAN maintainers. Package maintainers are expected to do their part---but checking their packages.
To take one example, Rcpp is package with a pretty large tail of dependencies: as of this writing in late 2017, about 1270 other packages use it. So 1270 other packages need to be tested. This takes time, especially when running serially. But it is easy to parallelise.
Previously, a few ad-hoc scripts (available here) were used for a number of packages. The scripts were one-offs and did their job. But with the idea of running jobs in parallel, the liteq package by Gabor Csardi fit the requirements nicely.
The first operation is to enqueue jobs. In the simplest form we do (assuming the included script
is in the
$ enqueueJobs -q queueDirectory Rcpp
The same operation can also be done from R itself, see
help(enqueueJobs). A package name has to
be supplied, a directory name (for the queue directory) is optional. This function uses two base R
functions to get all available package, and to determine the (non-recursive, first-level) reverse
dependencies of the given package. These are then added to the queue as "jobs".
This is the second operation, and it can be done in parallel. In other words, in several shells do
$ dequeueJobs -q queueDirectory Rcpp
which will find the (current) queue file in the specified directory for the given package, here
Rcpp. Again, this can also be done from an R prompt if you prefer, see
Each worker, when idle, goes to the queue and requests a job, which he then labors over by testing the thus-given reverse depedency. Once done, the worker is idle and returns to the queue and the cycle repeats.
As there is absolutely no interdepedence between the tests, this parallelises easily and up to the resource level of the machine.
To illustrate, "wall time" for a reverse-dependecy check of
Rcpp decreased from 14.91 hours to 3.75 hours (or
almost four-fold) using six workers. An earlier run of
RcppArmadillo decreased from 5.87 hours to
1.92 hours (or just over three-fold) using four workers, and to 1.29 hours (or by 4.5) using six
workers (and a fresh
here for its
impact). In all cases the machine was used which was generally not idle.
The following screenshot shows a run for RcppArmadillo with six workers. It shows the successes in green, skipped jobs in blue (from packages which sometimes would result in runaway tests) and no failures (which would be shown in red).
The scripts use an internal YAML file access via the
config package by JJ. The following locations are
/etc/R/prrd.yaml. For my initial
tests I used these values:
## config for prrd package default: setup: "~/git/prrd/local/setup.R" workdir: "/tmp/prrd" libdir: "/tmp/prrd/lib" debug: "false" verbose: "false"
libdir variables specify where tests are run, and which additonal library
directory is used. A more interesting variable is
setup which points to a helper scripts which
gets sourced. This permits setting of the CRAN repo address as well as of additonal environment
variables etc as needed for tests. My current script is
in the repository.
While the package is new, it has already been used for a few complete reverse depends tests runs.
The package is not yet on CRAN, but may be uploaded "soon".
GPL (>= 2)
Functions in prrd
|summariseQueue||Summarisse results from a reverse-dependency check|
|runSanityChecks||Various Helper Functions|
|getDatabaseConnection||Database Helper Functions|
|dequeueJobs||Dequeue and run reverse-dependency checks, possibly in parallel|
|enqueueJobs||Enqueues reverse-dependent packages|
Last month downloads
Include our badge in your README