drake (version 6.2.1)

drake_config: Create the internal runtime parameter list used internally in make().


This configuration list is also required for functions such as outdated(). It is meant to be specific to a single call to make(), and you should not modify it by hand afterwards. If you later plan to call make() with different arguments (especially targets), you should refresh the config list with another call to drake_config(). For changes to the targets argument specifically, it is important to recompute the config list to make sure the internal workflow network has all the targets you need. Modifying the targets element afterwards will have no effect and it could lead to false negative results from outdated()


drake_config(plan = drake::read_drake_plan(), targets = NULL,
  envir = parent.frame(), verbose = drake::default_verbose(),
  hook = NULL, cache = drake::get_cache(verbose = verbose,
  console_log_file = console_log_file), fetch_cache = NULL,
  parallelism = drake::default_parallelism(), jobs = 1,
  packages = rev(.packages()), prework = character(0),
  prepend = character(0), command = drake::default_Makefile_command(),
  args = drake::default_Makefile_args(jobs = jobs, verbose = verbose),
  recipe_command = drake::default_recipe_command(), timeout = NULL,
  cpu = Inf, elapsed = Inf, retries = 0, force = FALSE,
  log_progress = FALSE, graph = NULL, trigger = drake::trigger(),
  skip_targets = FALSE, skip_imports = FALSE,
  skip_safety_checks = FALSE, lazy_load = "eager",
  session_info = TRUE, cache_log_file = NULL, seed = NULL,
  caching = c("master", "worker"), keep_going = FALSE,
  session = NULL, imports_only = NULL, pruning_strategy = NULL,
  makefile_path = "Makefile", console_log_file = NULL,
  ensure_workers = TRUE, garbage_collection = FALSE,
  template = list(), sleep = function(i) 0.01,
  hasty_build = drake::default_hasty_build,
  memory_strategy = c("speed", "memory", "lookahead"), layout = NULL)



workflow plan data frame. A workflow plan data frame is a data frame with a target column and a command column. (See the details in the drake_plan() help file for descriptions of the optional columns.) Targets are the objects and files that drake generates, and commands are the pieces of R code that produce them. Use the function drake_plan() to generate workflow plan data frames easily, and see functions plan_analyses(), plan_summaries(), evaluate_plan(), expand_plan(), and gather_plan() for easy ways to generate large workflow plan data frames.


character vector, names of targets to build. Dependencies are built too. Together, the plan and targets comprise the workflow network (i.e. the graph argument). Changing either will change the network.


environment to use. Defaults to the current workspace, so you should not need to worry about this most of the time. A deep copy of envir is made, so you don't need to worry about your workspace being modified by make. The deep copy inherits from the global environment. Wherever necessary, objects and functions are imported from envir and the global environment and then reproducibly tracked as dependencies.


logical or numeric, control printing to the console. Use pkgconfig to set the default value of verbose for your R session: for example, pkgconfig::set_config("drake::verbose" = 2).

  • 0 or FALSE: print nothing.

  • 1 or TRUE: print only targets to build.

  • 2: also print checks and cache info.

  • 3: also print any potentially missing items.

  • 4: also print imports and writes to the cache.




drake cache as created by new_cache(). See also get_cache() and this_cache().


character vector containing lines of code. The purpose of this code is to fetch the storr cache with a command like storr_rds() or storr_dbi(), but customized. This feature is experimental. It will turn out to be necessary if you are using both custom non-RDS caches and distributed parallelism (parallelism = "future_lapply" or "Makefile") because the distributed R sessions need to know how to load the cache.


character, type of parallelism to use. To list the options, call parallelism_choices(). For detailed explanations, see the high-performance computing chapter of the user manual.


maximum number of parallel workers for processing the targets. If you wish to parallelize the imports and preprocessing as well, you can use a named numeric vector of length 2, e.g. make(jobs = c(imports = 4, targets = 8)). make(jobs = 4) is equivalent to make(jobs = c(imports = 1, targets = 4)).

Windows users should not set jobs > 1 if parallelism is "mclapply" because mclapply() is based on forking. Windows users who use parallelism = "Makefile" will need to download and install Rtools.

You can experiment with predict_runtime() to help decide on an appropriate number of jobs. For details, visit https://ropenscilabs.github.io/drake-manual/time.html.


character vector packages to load, in the order they should be loaded. Defaults to rev(.packages()), so you should not usually need to set this manually. Just call library() to load your packages before make(). However, sometimes packages need to be strictly forced to load in a certain order, especially if parallelism is "Makefile". To do this, do not use library() or require() or loadNamespace() or attachNamespace() to load any libraries beforehand. Just list your packages in the packages argument in the order you want them to be loaded. If parallelism is "mclapply", the necessary packages are loaded once before any targets are built. If parallelism is "Makefile", the necessary packages are loaded once on initialization and then once again for each target right before that target is built.


character vector of lines of code to run before build time. This code can be used to load packages, set options, etc., although the packages in the packages argument are loaded before any prework is done. If parallelism is "mclapply", the prework is run once before any targets are built. If parallelism is "Makefile", the prework is run once on initialization and then once again for each target right before that target is built.


lines to prepend to the Makefile if parallelism is "Makefile". See the high-performance computing guide to learn how to use prepend to distribute work on a cluster.


character scalar, command to call the Makefile generated for distributed computing. Only applies when parallelism is "Makefile". Defaults to the usual "make" (default_Makefile_command()), but it could also be "lsmake" on supporting systems, for example. command and args are executed via system2(command, args) to run the Makefile. If args has something like "--jobs=2", or if jobs >= 2 and args is left alone, targets will be distributed over independent parallel R sessions wherever possible.


command line arguments to call the Makefile for distributed computing. For advanced users only. If set, jobs and verbose are overwritten as they apply to the Makefile. command and args are executed via system2(command, args) to run the Makefile. If args has something like "--jobs=2", or if jobs >= 2 and args is left alone, targets will be distributed over independent parallel R sessions wherever possible.


Character scalar, command for the Makefile recipe for each target.


deprecated. Use elapsed and cpu instead.


Same as the cpu argument of setTimeLimit(). Seconds of cpu time before a target times out. Assign target-level cpu timeout times with an optional cpu column in plan.


Same as the elapsed argument of setTimeLimit(). Seconds of elapsed time before a target times out. Assign target-level elapsed timeout times with an optional elapsed column in plan.


Number of retries to execute if the target fails. Assign target-level retries with an optional retries column in plan.


Logical. If FALSE (default) then drake will stop you if the cache was created with an old and incompatible version of drake. This gives you an opportunity to downgrade drake to a compatible version rather than rerun all your targets from scratch. If force is TRUE, then make() executes your workflow regardless of the version of drake that last ran make() on the cache.


logical, whether to log the progress of individual targets as they are being built. Progress logging creates a lot of little files in the cache, and it may make builds a tiny bit slower. So you may see gains in storage efficiency and speed with make(..., log_progress = FALSE). But be warned that progress() and in_progress() will no longer work if you do that.


An igraph object from the previous make(). Supplying a pre-built graph could save time.


Name of the trigger to apply to all targets. Ignored if plan has a trigger column. See trigger() for details.


logical, whether to skip building the targets in plan and just import objects and files.


logical, whether to totally neglect to process the imports and jump straight to the targets. This can be useful if your imports are massive and you just want to test your project, but it is bad practice for reproducible data analysis. This argument is overridden if you supply your own graph argument.


logical, whether to skip the safety checks on your workflow. Use at your own peril.


either a character vector or a logical. Choices:

  • "eager": no lazy loading. The target is loaded right away with assign().

  • "promise": lazy loading with delayedAssign()

  • "bind": lazy loading with active bindings: bindr::populate_env().

  • TRUE: same as "promise".

  • FALSE: same as "eager".

lazy_load should not be "promise" for "parLapply" parallelism combined with jobs greater than 1. For local multi-session parallelism and lazy loading, try library(future); future::plan(multisession) and then make(..., parallelism = "future_lapply", lazy_load = "bind").

If lazy_load is "eager", drake prunes the execution environment before each target/stage, removing all superfluous targets and then loading any dependencies it will need for building. In other words, drake prepares the environment in advance and tries to be memory efficient. If lazy_load is "bind" or "promise", drake assigns promises to load any dependencies at the last minute. Lazy loading may be more memory efficient in some use cases, but it may duplicate the loading of dependencies, costing time.


logical, whether to save the sessionInfo() to the cache. This behavior is recommended for serious make()s for the sake of reproducibility. This argument only exists to speed up tests. Apparently, sessionInfo() is a bottleneck for small make()s.


Name of the cache log file to write. If TRUE, the default file name is used (drake_cache.log). If NULL, no file is written. If activated, this option uses drake_cache_log_file() to write a flat text file to represent the state of the cache (fingerprints of all the targets and imports). If you put the log file under version control, your commit history will give you an easy representation of how your results change over time as the rest of your project changes. Hopefully, this is a step in the right direction for data reproducibility.


integer, the root pseudo-random number generator seed to use for your project. In make(), drake generates a unique local seed for each target using the global seed and the target name. That way, different pseudo-random numbers are generated for different targets, and this pseudo-randomness is reproducible.

To ensure reproducibility across different R sessions, set.seed() and .Random.seed are ignored and have no affect on drake workflows. Conversely, make() does not usually change .Random.seed, even when pseudo-random numbers are generated. The exceptions to this last point are make(parallelism = "clustermq") and make(parallelism = "clustermq_staged"), because the clustermq package needs to generate random numbers to set up ports and sockets for ZeroMQ.

On the first call to make() or drake_config(), drake uses the random number generator seed from the seed argument. Here, if the seed is NULL (default), drake uses a seed of 0. On subsequent make()s for existing projects, the project's cached seed will be used in order to ensure reproducibility. Thus, the seed argument must either be NULL or the same seed from the project's cache (usually the .drake/ folder). To reset the random number generator seed for a project, use clean(destroy = TRUE).


character string, only applies to "clustermq", "clustermq_staged", and "future" parallel backends. The caching argument can be either "master" or "worker".

  • "master": Targets are built by remote workers and sent back to the master process. Then, the master process saves them to the cache (config$cache, usually a file system storr). Appropriate if remote workers do not have access to the file system of the calling R session. Targets are cached one at a time, which may be slow in some situations.

  • "worker": Remote workers not only build the targets, but also save them to the cache. Here, caching happens in parallel. However, remote workers need to have access to the file system of the calling R session. Transferring target data across a network can be slow.


logical, whether to still keep running make() if targets fail.


An optional callr function if you want to build all your targets in a separate master session: for example, make(plan = my_plan, session = callr::r_vanilla). Running make() in a clean, isolated session can enhance reproducibility. But be warned: if you do this, make() will take longer to start. If session is NULL (default), then make() will just use your current R session as the master session. This is slightly faster, but it causes make() to populate your workspace/environment with the last few targets it builds.


deprecated. Use skip_targets instead.


deprecated. See memory_strategy.


Path to the Makefile for make(parallelism = "Makefile"). If you set this argument to a non-default value, you are responsible for supplying this same path to the args argument so make knows where to find it. Example: make(parallelism = "Makefile", makefile_path = ".drake/.makefile", command = "make", args = "--file=.drake/.makefile")


character scalar, connection object (such as stdout()) or NULL. If NULL, console output will be printed to the R console using message(). If a character scalar, console_log_file should be the name of a flat file, and console output will be appended to that file. If a connection object (e.g. stdout()) warnings and messages will be sent to the connection. For example, if console_log_file is stdout(), warnings and messages are printed to the console in real time (in addition to the usual in-bulk printing after each target finishes).


logical, whether the master process should wait for the workers to post before assigning them targets. Should usually be TRUE. Set to FALSE for make(parallelism = "future_lapply", jobs = n) (n > 1) when combined with future::plan(future::sequential). This argument only applies to parallel computing with persistent workers (make(parallelism = x), where x could be "mclapply", "parLapply", or "future_lapply").


logical, whether to call gc() each time a target is built during make().


a named list of values to fill in the {{ ... }} placeholders in template files (e.g. from drake_hpc_template_file()). Same as the template argument of clustermq::Q() and clustermq::workers. Enabled for clustermq only (make(parallelism = "clustermq_staged")), not future or batchtools so far. For more information, see the clustermq package: https://github.com/mschubert/clustermq. Some template placeholders such as {{ job_name }} and {{ n_jobs }} cannot be set this way.


In its parallel processing, drake uses a central master process to check what the parallel workers are doing, and for the affected high-performance computing workflows, wait for data to arrive over a network. In between loop iterations, the master process sleeps to avoid throttling. The sleep argument to make() and drake_config() allows you to customize how much time the master process spends sleeping.

The sleep argument is a function that takes an argument i and returns a numeric scalar, the number of seconds to supply to Sys.sleep() after iteration i of checking. (Here, i starts at 1.) If the checking loop does something other than sleeping on iteration i, then i is reset back to 1.

To sleep for the same amount of time between checks, you might supply something like function(i) 0.01. But to avoid consuming too many resources during heavier and longer workflows, you might use an exponential back-off: say, function(i) { 0.1 + 120 * pexp(i - 1, rate = 0.01) }.


a user-defined function. In "hasty mode" (make(parallelism = "hasty")) this is the function that evaluates a target's command and returns the resulting value. The hasty_build argument has no effect if parallelism is any value other than "hasty".

The function you pass to hasty_build must have arguments target and config. Here, target is a character scalar naming the target being built, and config is a configuration list of runtime parameters generated by drake_config().


Character scalar, name of the strategy drake uses to manage targets in memory. For more direct control over which targets drake keeps in memory, see the help file examples of drake_envir(). The memory_strategy argument to make() and drake_config() is an attempt at an automatic catch-all solution. These are the choices.

  • "speed": Once a target is loaded in memory, just keep it there. Maximizes speed, but hogs memory.

  • "memory": For each target, unload everything from memory except the target's direct dependencies. Conserves memory, but sacrifices speed because each new target needs to reload any previously unloaded targets from the cache.

  • "lookahead" (default): keep loaded targets in memory until they are no longer needed as dependencies in downstream build steps. Then, unload them from the environment. This step avoids keeping unneeded data in memory and minimizes expensive reads from the cache. However, it requires looking ahead in the dependency graph, which could add overhead for every target of projects with lots of targets.

Each strategy has a weakness. "speed" is memory-hungry, "memory" wastes time reloading targets from storage, and "lookahead" wastes time traversing the entire dependency graph on every make(). For a better compromise and more control, see the examples in the help file of drake_envir().


config$layout, where config is the return value from a prior call to drake_config(). If your plan or environment have changed since the last make(), do not supply a layout argument. Otherwise, supplying one could save time.


The master internal configuration list of a project.

See Also

make(), drake_plan(), vis_drake_graph()


Run this code
test_with_dir("Quarantine side effects.", {
load_mtcars_example() # Get the code with drake_example("mtcars").
# Construct the master internal configuration list.
config <- drake_config(my_plan)
vis_drake_graph(config) # See the dependency graph.
sankey_drake_graph(config) # See the dependency graph.
# These functions are faster than otherwise
# because they use the configuration list.
outdated(config) # Which targets are out of date?
missed(config) # Which imports are missing?
# }

Run the code above in your browser using DataLab