Learn R Programming

systemPipeR (version 1.6.2)

clusterRun: Submit command-line tools to cluster

Description

Submits non-R command-line software to queueing/scheduling systems of compute clusters using run specifications defined by functions similar to runCommandline. runCluster can be used with most queueing systems since it is based on utilities from the BatchJobs package which supports the use of template files (*.tmpl) for defining the run parameters of the different schedulers. The path to the *.tmpl file needs to be specified in a conf file provided under the conffile argument.

Usage

clusterRun(args, FUN=runCommandline, conffile = ".BatchJobs.R", template = "torque.tmpl", Njobs, runid = "01", resourceList)

Arguments

args
Object of class SYSargs.
FUN
Accpets functions such as runCommandline(args, ...) where the args argument is mandatory and needs to be of class SYSargs.
conffile
Path to conf file (default location ./.BatchJobs.R). This file contains in its simplest form just one command, such as this line for the Torque scheduler: cluster.functions <- makeClusterFunctionsTorque("torque.tmpl"). For more detailed information visit this page: https://code.google.com/p/batchjobs/wiki/DortmundUsage
template
The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
Njobs
Interger defining the number of cluster jobs. For instance, if args contains 18 command-line jobs and Njobs=9, then the function will distribute them accross 9 cluster jobs each running 2 command-line jobs. To increase the number of CPU cores used by each process, one can do this under the corresonding argument of the command-line tool, e.g. -p argument for Tophat.
runid
Run identifier used for log file to track system call commands. Default is "01".
resourceList
List for reserving for each cluster job sufficient computing resources including memory, number of nodes, CPU cores, walltime, etc. For more details, one can consult the template file for each queueing/scheduling system.

Value

  • Object of class Registry, as well as files and directories created by the executed command-line tools.

References

For more details on BatchJobs, please consult the following pages: http://sfb876.tu-dortmund.de/PublicPublicationFiles/bischl_etal_2012a.pdf https://github.com/tudo-r/BatchJobs http://goo.gl/k3Tu5Y

See Also

clusterRun replaces the older functions getQsubargs and qsubRun.

Examples

Run this code
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Torque scheduler. Please
## read the instructions above how to obtain the corresponding files for other schedulers. 
file.copy(system.file("extdata", ".BatchJobs.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "torque.tmpl", package="systemPipeR"), ".")
resources <- list(walltime="00:25:00", nodes=paste0("1:ppn=", cores(args)), memory="2gb")
reg <- clusterRun(args, conffile=".BatchJobs", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
showStatus(reg)
file.exists(outpaths(args))
sapply(1:length(args), function(x) loadResult(reg, x)) # Works once all jobs have completed successfully.

## Alignment stats
read_statsDF <- alignStats(fqpaths=tophatargs$infile1, bampaths=bampaths, fqgz=TRUE) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

Run the code above in your browser using DataLab