Set and get number of threads to be used in
data.table functions that are parallelized with OpenMP.
setDTthreads(threads = NULL, restore_after_fork = NULL, percent = NULL) getDTthreads(verbose = getOption("datatable.verbose"))
NULL (default) rereads environment variables. 0 means to use all logical CPUs available. Otherwise a number >= 1
Should data.table be multi-threaded after a fork has completed? NULL leaves the current setting unchanged which by default is TRUE. See details below.
If provided it should be a number betwen 2 and 100; the percentage of logical CPUs to use.
Display the value of relevant OpenMP settings plus the
restore_after_fork internal option.
A length 1
integer. The old value is returned by
setDTthreads so you can store that prior value and pass it to
setDTthreads() again after the section of your code where you control the number of threads.
data.table automatically switches to single threaded mode upon fork (the mechanism used by
mclapply and the foreach package). Otherwise, nested parallelism would very likely overload your CPUs and result in much slower execution. As
data.table becomes more parallel internally, we expect explicit user parallelism to be needed less often. The
restore_after_fork option controls what happens after the explicit fork parallelism completes. It needs to be at C level so it is not a regular R option using
options(). By default
data.table will be multi-threaded again; restoring the prior setting of
getDTthreads(). But problems have been reported in the past on Mac with Intel OpenMP libraries whereas success has been reported on Linux. If you experience problems after fork, start a new R session and change the default behaviour by calling
setDTthreads(restore_after_fork=FALSE) before retrying. Please raise issues on the data.table GitHub issues page.
The number of logical CPUs is determined by the OpenMP function
omp_get_num_procs() whose meaning may vary across platforms and OpenMP implementations.
setDTthreads() will not allow more than this limit. Neither will it allow more than
omp_get_thread_limit() nor the current value of
Sys.getenv("OMP_THREAD_LIMIT"). Note that CRAN sets
OMP_THREAD_LIMIT to 2 and should always be respected; i.e., never change
OMP_THREAD_LIMIT in a package to a value greater than 2. Some hardware allows CPUs to be removed and/or replaced while the server is running. If this happens, our understanding is that
omp_get_num_procs() will reflect the new number of processors available. If a processor has been physically removed, or the logical processors reconfigured since data.table started,
setDTthreads(...) will need to be called again by you before data.table will reflect the change. If you have such hardware, please let us know your experience via GitHub issues / feature requests.
getDTthreads(verbose=TRUE) to see the relevant environment variables, their values and the current number of threads data.table is using. For example, the environment variable
R_DATATABLE_NUM_PROCS_PERCENT can be used to change the default number of logical CPUs from 50
data.table only and does not change R itself or other packages using OpenMP. We have followed the advice of section 184.108.40.206 in the R-exts manual: "… or, better, for the regions in your code as part of their specification… num_threads(nthreads)… That way you only control your own code and not that of other OpenMP users." Every parallel regions in data.table contain a
num_threads(getDTthreads()) directive. This is mandated by a
grep in data.table's quality control CRAN release procedure script.
setDTthreads(0) is the same as
setDTthreads(percent=100); i.e. use all logical CPUs, subject to
Sys.getenv("OMP_THREAD_LIMIT"). Please note again that CRAN sets OMP_THREAD_LIMIT to 2, so never change
OMP_THREAD_LIMIT in a CRAN package to a value greater than 2.