Cluster functions for Docker/Docker Swarm (https://docs.docker.com/swarm/).
The submitJob function executes
docker [docker.args] run --detach=true [image.args] [resources] [image] [cmd].
Arguments docker.args, image.args and image can be set on construction.
The resources part takes the named resources ncpus and memory
from submitJobs and maps them to the arguments --cpu-shares and --memory
(in Megabytes). The resource threads is mapped to the environment variables “OMP_NUM_THREADS”
and “OPENBLAS_NUM_THREADS”.
To reliably identify jobs in the swarm, jobs are labeled with “batchtools=[job.hash]” and named
using the current login name (label “user”) and the job hash (label “batchtools”).
listJobsRunning uses docker [docker.args] ps --format={{.ID}} to filter for running jobs.
killJobs uses docker [docker.args] kill [batch.id] to filter for running jobs.
These cluster functions use a Hook to remove finished jobs before a new submit and every time the Registry
is synchronized (using syncRegistry).
This is currently required because docker does not remove terminated containers automatically.
Use docker ps -a --filter 'label=batchtools' --filter 'status=exited' to identify and remove terminated
containers manually (or usa a cron job).
makeClusterFunctionsDocker(
image,
docker.args = character(0L),
image.args = character(0L),
scheduler.latency = 1,
fs.latency = 65
)[character(1)]
Name of the docker image to run.
[character]
Additional arguments passed to “docker” *before* the command (“run”, “ps” or “kill”) to execute (e.g., the docker host).
[character]
Additional arguments passed to “docker run” (e.g., to define mounts or environment variables).
[numeric(1)]
Time to sleep after important interactions with the scheduler to ensure a sane state.
Currently only triggered after calling submitJobs.
[numeric(1)]
Expected maximum latency of the file system, in seconds.
Set to a positive number for network file systems like NFS which enables more robust (but also more expensive) mechanisms to
access files and directories.
Usually safe to set to 0 to disable the heuristic, e.g. if you are working on a local file system.
Other ClusterFunctions:
makeClusterFunctionsInteractive(),
makeClusterFunctionsLSF(),
makeClusterFunctionsMulticore(),
makeClusterFunctionsOpenLava(),
makeClusterFunctionsSGE(),
makeClusterFunctionsSSH(),
makeClusterFunctionsSlurm(),
makeClusterFunctionsSocket(),
makeClusterFunctionsTORQUE(),
makeClusterFunctions()