make: Run your project (build the outdated targets).

Description

This is the central, most important function of the drake package. It runs all the steps of your workflow in the correct order, skipping any work that is already up to date. See https://ropensci.github.io/drake/ for the full documentation, which includes multiple in-depth tutorials.

Usage

make(plan = drake_plan(), targets = drake::possible_targets(plan),
  envir = parent.frame(), verbose = 1, hook = default_hook,
  cache = drake::get_cache(verbose = verbose, force = force),
  fetch_cache = NULL, parallelism = drake::default_parallelism(),
  jobs = 1, packages = rev(.packages()), prework = character(0),
  prepend = character(0), command = drake::default_Makefile_command(),
  args = drake::default_Makefile_args(jobs = jobs, verbose = verbose),
  recipe_command = drake::default_recipe_command(), log_progress = TRUE,
  imports_only = FALSE, timeout = Inf, cpu = NULL, elapsed = NULL,
  retries = 0, force = FALSE, return_config = NULL, graph = NULL,
  trigger = drake::default_trigger(), skip_imports = FALSE,
  skip_safety_checks = FALSE, config = NULL, lazy_load = FALSE,
  session_info = TRUE, cache_log_file = NULL)

Arguments

plan

workflow plan data frame. A workflow plan data frame is a data frame with a target column and a command column. Targets are the objects and files that drake generates, and commands are the pieces of R code that produce them. Use the function drake_plan() to generate workflow plan data frames easily, and see functions analyses(), summaries(), evaluate(), expand(), and gather() for easy ways to generate large workflow plan data frames.

targets

character string, names of targets to build. Dependencies are built too. Together, the plan and targets comprise the workflow network (i.e. the graph argument). Changing either will change the network.

envir

environment to use. Defaults to the current workspace, so you should not need to worry about this most of the time. A deep copy of envir is made, so you don't need to worry about your workspace being modified by make. The deep copy inherits from the global environment. Wherever necessary, objects and functions are imported from envir and the global environment and then reproducibly tracked as dependencies.

verbose

logical or numeric, control printing to the console.

0 or FALSE:: print nothing.
1 or TRUE:: print checks and targets to build.
2:: print checks, targets, to build, and any potentially missing items.
3:: full verbosity: print checks, targets to build, potentially missing items, and imports.

hook

function with at least one argument. The hook is as a wrapper around the code that drake uses to build a target (see the body of drake:::build_in_hook()). Hooks can control the side effects of build behavior. For example, to redirect output and error messages to text files, you might use the built-in silencer_hook(), as in make(my_plan, hook = silencer_hook). The silencer hook is useful for distributed parallelism, where the calling R process does not have control over all the error and output streams. See also output_sink_hook() and message_sink_hook(). For your own custom hooks, treat the first argument as the code that builds a target, and make sure this argument is actually evaluated. Otherwise, the code will not run and none of your targets will build. For example, function(code){force(code)} is a good hook and function(code){message("Avoiding the code")} is a bad hook.

cache

drake cache as created by new_cache(). See also get_cache(), this_cache(), and recover_cache()

fetch_cache

character vector containing lines of code. The purpose of this code is to fetch the storr cache with a command like storr_rds() or storr_dbi(), but customized. This feature is experimental. It will turn out to be necessary if you are using both custom non-RDS caches and distributed parallelism (parallelism = "future_lapply" or "Makefile") because the distributed R sessions need to know how to load the cache.

parallelism

character, type of parallelism to use. To list the options, call parallelism_choices(). For detailed explanations, see ?parallelism_choices, the tutorial vignettes, or the tutorial files generated by drake_example("basic")

jobs

number of parallel processes or jobs to run. See max_useful_jobs() or vis_drake_graph() to help figure out what the number of jobs should be. Windows users should not set jobs > 1 if parallelism is "mclapply" because mclapply() is based on forking. Windows users who use parallelism == "Makefile" will need to download and install Rtools.

For "future_lapply" parallelism, jobs only applies to the imports. To set the max number of jobs for "future_lapply" parallelism, set the workers argument where it exists: for example, call future::plan(multisession(workers = 4)), then call make(your_plan, parallelism = "future_lapply"). You might also try options(mc.cores = jobs), or see ?future::future::.options for environment variables that set the max number of jobs.

If parallelism is "Makefile", Makefile-level parallelism is only used for targets in your workflow plan data frame, not imports. To process imported objects and files, drake selects the best parallel backend for your system and uses the number of jobs you give to the jobs argument to make(). To use at most 2 jobs for imports and at most 4 jobs for targets, run make(..., parallelism = "Makefile", jobs = 2, args = "--jobs=4")

packages

character vector packages to load, in the order they should be loaded. Defaults to rev(.packages()), so you should not usually need to set this manually. Just call library() to load your packages before make(). However, sometimes packages need to be strictly forced to load in a certain order, especially if parallelism is "Makefile". To do this, do not use library() or require() or loadNamespace() or attachNamespace() to load any libraries beforehand. Just list your packages in the packages argument in the order you want them to be loaded. If parallelism is "mclapply", the necessary packages are loaded once before any targets are built. If parallelism is "Makefile", the necessary packages are loaded once on initialization and then once again for each target right before that target is built.

prework

character vector of lines of code to run before build time. This code can be used to load packages, set options, etc., although the packages in the packages argument are loaded before any prework is done. If parallelism is "mclapply", the prework is run once before any targets are built. If parallelism is "Makefile", the prework is run once on initialization and then once again for each target right before that target is built.

prepend

lines to prepend to the Makefile if parallelism is "Makefile". See the vignettes (vignette(package = "drake")) to learn how to use prepend to take advantage of multiple nodes of a supercomputer.

command

character scalar, command to call the Makefile generated for distributed computing. Only applies when parallelism is "Makefile". Defaults to the usual "make" (default_Makefile_command()), but it could also be "lsmake" on supporting systems, for example. command and args are executed via system2(command, args) to run the Makefile. If args has something like "--jobs=2", or if jobs >= 2 and args is left alone, targets will be distributed over independent parallel R sessions wherever possible.

args

command line arguments to call the Makefile for distributed computing. For advanced users only. If set, jobs and verbose are overwritten as they apply to the Makefile. command and args are executed via system2(command, args) to run the Makefile. If args has something like "--jobs=2", or if jobs >= 2 and args is left alone, targets will be distributed over independent parallel R sessions wherever possible.

recipe_command

Character scalar, command for the Makefile recipe for each target.

log_progress

logical, whether to log the progress of individual targets as they are being built. Progress logging creates a lot of little files in the cache, and it may make builds a tiny bit slower. So you may see gains in storage efficiency and speed with make(..., log_progress = FALSE). But be warned that progress() and in_progress() will no longer work if you do that.

imports_only

logical, whether to skip building the targets in plan and just import objects and files.

timeout

Seconds of overall time to allow before imposing a timeout on a target. Passed to R.utils::withTimeout(). Assign target-level timeout times with an optional timeout column in plan.

cpu

Seconds of cpu time to allow before imposing a timeout on a target. Passed to R.utils::withTimeout(). Assign target-level cpu timeout times with an optional cpu column in plan.

elapsed

Seconds of elapsed time to allow before imposing a timeout on a target. Passed to R.utils::withTimeout(). Assign target-level elapsed timeout times with an optional elapsed column in plan.

retries

Number of retries to execute if the target fails. Assign target-level retries with an optional retries column in plan.

force

Force make() to build your targets even if some about your setup is not quite right: for example, if you are using a version of drake that is not back compatible with your project's cache.

return_config

Logical, whether to return the internal list of runtime configuration parameters used by make(). This argument is deprecated. Now, a configuration list is always invisibly returned.

graph

An igraph object from the previous make(). Supplying a pre-built graph could save time. The graph is constructed by build_drake_graph(). You can also get one from config(my_plan)$graph. Overrides skip_imports.

trigger

Name of the trigger to apply to all targets. Ignored if plan has a trigger column. Must be in triggers(). See ?triggers for explanations of the choices.

skip_imports

logical, whether to totally neglect to process the imports and jump straight to the targets. This can be useful if your imports are massive and you just want to test your project, but it is bad practice for reproducible data analysis. This argument is overridden if you supply your own graph argument.

skip_safety_checks

logical, whether to skip the safety checks on your workflow. Use at your own peril.

config

optional master configuration list created by drake_config(). Using one could cut out some superfluous overhead. Overrides all other arguments if supplied

lazy_load

logical. Should always be set to FALSE for "parLapply" parallelism and jobs greater than 1. For local multi-session parallelism and lazy loading, try library(future); future::plan(multisession) and then make(..., parallelism = "future_lapply", lazy_load = TRUE). If lazy_load is FALSE, drake prunes the execution environment before every parallelizable stages, removing all superfluous targets and then loading any dependencies it will need for the targets in the current parallelizable stage. In other words, drake prepares the environment in advance for all the whole collection of targets in the stage. If lazy_load is TRUE, drake assigns promises to load any dependencies at the last minute. Lazy loading may be more memory efficient in some use cases, but it may duplicate the loading of dependencies, costing time.

session_info

logical, whether to save the sessionInfo() to the cache. This behavior is recommended for serious make()s for the sake of reproducibility. This argument only exists to speed up tests. Apparently, sessionInfo() is a bottleneck for small make()s.

cache_log_file

Name of the cache log file to write. If TRUE, the default file name is used (drake_cache.log). If NULL, no file is written. If activated, this option uses drake_cache_log_file() to write a flat text file to represent the state of the cache (fingerprints of all the targets and imports). If you put the log file under version control, your commit history will give you an easy representation of how your results change over time as the rest of your project changes. Hopefully, this is a step in the right direction for data reproducibility.

Value

The master internal configuration list, mostly containing arguments to make() and important objects constructed along the way. See config() for more details.

Examples

Run this code

# NOT RUN {
test_with_dir("Quarantine side effects.", {
load_basic_example() # Get the code with drake_example("basic").
config <- drake_config(my_plan)
outdated(config) # Which targets need to be (re)built?
my_jobs = max_useful_jobs(config) # Depends on what is up to date.
make(my_plan, jobs = 2) # Build what needs to be built.
outdated(config) # Everything is up to date.
# Change one of your imported function dependencies.
reg2 = function(d){
  d$x3 = d$x^3
  lm(y ~ x3, data = d)
}
outdated(config) # Some targets depend on reg2().
vis_drake_graph(config) # See how they fit in an interactive graph.
make(my_plan) # Rebuild just the outdated targets.
outdated(config) # Everything is up to date again.
make(my_plan, cache_log_file = TRUE) # Write a text log file this time.
vis_drake_graph(config) # The colors changed in the graph.
clean() # Start from scratch.
# Rerun with "Makefile" parallelism with at most 4 jobs.
# Requires Rtools on Windows.
# make(my_plan, parallelism = "Makefile", jobs = 4) # nolint
clean() # Start from scratch.
# Specify your own Makefile recipe.
# Requires Rtools on Windows.
# make(my_plan, parallelism = "Makefile", jobs = 4, # nolint
#   recipe_command = "R -q -e") # nolint
})
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples