Cluster functions for Docker/Docker Swarm (https://docs.docker.com/swarm/).
The submitJob
function executes
docker [docker.args] run --detach=true [image.args] [resources] [image] [cmd]
.
Arguments docker.args
, image.args
and image
can be set on construction.
The resources
part takes the named resources ncpus
and memory
from submitJobs
and maps them to the arguments --cpu-shares
and --memory
(in Megabytes). The resource threads
is mapped to the environment variables “OMP_NUM_THREADS”
and “OPENBLAS_NUM_THREADS”.
To reliably identify jobs in the swarm, jobs are labeled with “batchtools=[job.hash]” and named
using the current login name (label “user”) and the job hash (label “batchtools”).
listJobsRunning
uses docker [docker.args] ps --format={{.ID}}
to filter for running jobs.
killJobs
uses docker [docker.args] kill [batch.id]
to filter for running jobs.
These cluster functions use a Hook to remove finished jobs before a new submit and every time the Registry
is synchronized (using syncRegistry
).
This is currently required because docker does not remove terminated containers automatically.
Use docker ps -a --filter 'label=batchtools' --filter 'status=exited'
to identify and remove terminated
containers manually (or usa a cron job).
makeClusterFunctionsDocker(image, docker.args = character(0L),
image.args = character(0L), scheduler.latency = 1, fs.latency = 65)
[character(1)
]
Name of the docker image to run.
[character
]
Additional arguments passed to “docker” *before* the command (“run”, “ps” or “kill”) to execute (e.g., the docker host).
[character
]
Additional arguments passed to “docker run” (e.g., to define mounts or environment variables).
[numeric(1)
]
Time to sleep after important interactions with the scheduler to ensure a sane state.
Currently only triggered after calling submitJobs
.
[numeric(1)
]
Expected maximum latency of the file system, in seconds.
Set to a positive number for network file systems like NFS which enables more robust (but also more expensive) mechanisms to
access files and directories.
Usually safe to set to NA
to disable the heuristic, e.g. if you are working on a local file system.
Other ClusterFunctions: makeClusterFunctionsInteractive
,
makeClusterFunctionsLSF
,
makeClusterFunctionsMulticore
,
makeClusterFunctionsOpenLava
,
makeClusterFunctionsSGE
,
makeClusterFunctionsSSH
,
makeClusterFunctionsSlurm
,
makeClusterFunctionsSocket
,
makeClusterFunctionsTORQUE
,
makeClusterFunctions