evaluate_plan: Use wildcard templating to create a workflow plan data frame from a template data frame.

Description

Wildcards are no longer recommended. Consider using transformations instead. Visit https://ropenscilabs.github.io/drake-manual/plans.html#large-plans for the details.

Usage

evaluate_plan(plan, rules = NULL, wildcard = NULL, values = NULL,
  expand = TRUE, rename = expand, trace = FALSE,
  columns = "command", sep = "_")

Arguments

plan

Workflow plan data frame, similar to one produced by drake_plan().

rules

Named list with wildcards as names and vectors of replacements as values. This is a way to evaluate multiple wildcards at once. When not NULL, rules overrules wildcard and values if not NULL.

wildcard

Character scalar denoting a wildcard placeholder.

values

Vector of values to replace the wildcard in the drake instructions. Will be treated as a character vector. Must be the same length as plan$command if expand is TRUE.

expand

If TRUE, create a new rows in the workflow plan data frame if multiple values are assigned to a single wildcard. If FALSE, each occurrence of the wildcard is replaced with the next entry in the values vector, and the values are recycled.

rename

Logical, whether to rename the targets based on the values supplied for the wildcards (based on values or rules).

trace

Logical, whether to add columns that trace the wildcard expansion process. These new columns indicate which targets were evaluated and with which wildcards.

columns

Character vector of names of columns to look for and evaluate the wildcards.

sep

Character scalar, separator for the names of the new targets generated. For example, in evaluate_plan(drake_plan(x = sqrt(y__)), list(y__ = 1:2), sep = "."), the names of the new targets are x.1 and x.2.

Value

A workflow plan data frame with the wildcards evaluated.

Details

The commands in workflow plan data frames can have wildcard symbols that can stand for datasets, parameters, function arguments, etc. These wildcards can be evaluated over a set of possible values using evaluate_plan().

Specify a single wildcard with the wildcard and values arguments. In each command, the text in wildcard will be replaced by each value in values in turn. Specify multiple wildcards with the rules argument, which overrules wildcard and values if not NULL. Here, rules should be a list with wildcards as names and vectors of possible values as list elements.

Examples

Run this code

# NOT RUN {
# Create the part of the workflow plan for the datasets.
datasets <- drake_plan(
  small = simulate(5),
  large = simulate(50)
)
# Create a template workflow plan for the analyses.
methods <- drake_plan(
  regression1 = reg1(dataset__),
  regression2 = reg2(dataset__)
)
# Evaluate the wildcards in the template
# to produce the actual part of the workflow plan
# that encodes the analyses of the datasets.
# Create one analysis for each combination of dataset and method.
evaluate_plan(
  methods,
  wildcard = "dataset__",
  values = datasets$target
)
# Only choose some combinations of dataset and analysis method.
ans <- evaluate_plan(
  methods,
  wildcard = "dataset__",
  values = datasets$target,
  expand = FALSE
)
ans
# For the complete workflow plan, row bind the pieces together.
my_plan <- rbind(datasets, ans)
my_plan
# Wildcards for evaluate_plan() do not need the double-underscore suffix.
# Any valid symbol will do.
plan <- drake_plan(
  numbers = sample.int(n = `{Number}`, size = ..size)
)
evaluate_plan(
  plan,
  rules = list(
    "`{Number}`" = c(10, 13),
    ..size = c(3, 4)
  )
)
# Workflow plans can have multiple wildcards.
# Each combination of wildcard values will be used
# Except when expand is FALSE.
x <- drake_plan(draws = sample.int(n = N, size = Size))
evaluate_plan(x, rules = list(N = 10:13, Size = 1:2))
# You can use wildcards on columns other than "command"
evaluate_plan(
  drake_plan(
    x = target("1 + 1", cpu = "any"),
    y = target("sqrt(4)", cpu = "always"),
    z = target("sqrt(16)", cpu = "any")
  ),
  rules = list(always = 1:2),
  columns = c("command", "cpu")
)
# With the `trace` argument,
# you can generate columns that show how the wildcards
# were evaluated.
plan <- drake_plan(x = sample.int(n__), y = sqrt(n__))
plan <- evaluate_plan(plan, wildcard = "n__", values = 1:2, trace = TRUE)
print(plan)
# With the `trace` argument,
# you can generate columns that show how the wildcards
# were evaluated. Then you can visualize the wildcard groups
# as clusters.
plan <- drake_plan(x = sqrt(n__), y = sample.int(n__))
plan <- evaluate_plan(plan, wildcard = "n__", values = 1:2, trace = TRUE)
print(plan)
cache <- storr::storr_environment()
config <- drake_config(plan, cache = cache)
# }
# NOT RUN {
if (requireNamespace("visNetwork", quietly = TRUE)) {
vis_drake_graph(config, group = "n__", clusters = "1")
vis_drake_graph(config, group = "n__", clusters = c("1", "2"))
make(plan, targets = c("x_1", "y_2"), cache = cache)
# Optionally cluster on columns supplied by `drake_graph_info()$nodes`.
vis_drake_graph(config, group = "status", clusters = "up to date")
}
# }