predict_runtime: Predict the elapsed runtime of the next call to `make()` for non-staged parallel backends.

Description

Take the past recorded runtimes times from build_times() and use them to predict how the targets will be distributed among the available workers in the next make(). Then, predict the overall runtime to be the runtime of the slowest (busiest) workers. See Details for some caveats.

Usage

predict_runtime(config = drake::read_drake_config(), targets = NULL,
  from_scratch = FALSE, targets_only = FALSE, future_jobs = NULL,
  digits = NULL, jobs = 1, known_times = numeric(0), default_time = 0,
  warn = TRUE)

Arguments

config

option internal runtime parameter list of make(...), produced by both make() and drake_config().

targets

Character vector, names of targets. Predict the runtime of building these targets plus dependencies. Defaults to all targets.

from_scratch

logical, whether to predict a make() build from scratch or to take into account the fact that some targets may be already up to date and therefore skipped.

targets_only

logical, whether to factor in just the targets into the calculations or use the build times for everything, including the imports

future_jobs

deprecated

digits

deprecated

jobs

the jobs argument of your next planned make(). How many targets to do you plan to have running simultaneously?

known_times

a named numeric vector with targets/imports as names and values as hypothetical runtimes in seconds. Use this argument to overwrite any of the existing build times or the default_time.

default_time

number of seconds to assume for any target or import with no recorded runtime (from build_times()) or anything in known_times.

warn

logical, whether to warn the user about any targets with no available runtime, either in known_times or build_times(). The times for these targets default to default_time.

Details

The prediction is only a rough approximation. The algorithm that emulates the workers is not perfect, and it may turn out to perform poorly in some edge cases. It also assumes you are using one of the backends with persistent workers ("mclapply", "parLapply", or "future_lapply"), though the transient worker backends "future" and "Makefile" should be similar. The prediction does not apply to staged parallelism backends such as make(parallelism = "mclapply_staged") or make(parallelism = "parLapply_staged"). The function also assumes that the overhead of initializing make() and any workers is negligible. Use the default_time and known_times arguments to adjust the assumptions as needed.

Examples

Run this code

# NOT RUN {
test_with_dir("Quarantine side effects.", {
load_mtcars_example() # Get the code with drake_example("mtcars").
config <- make(my_plan) # Run the project, build the targets.
# The predictions use the cached build times of the targets,
# but if you expect your target runtimes
# to be different, you can specify them (in seconds).
known_times <- c(5, rep(7200, nrow(my_plan) - 1))
names(known_times) <- c(file_store("report.md"), my_plan$target[-1])
known_times
# Predict the runtime
predict_runtime(
  config,
  jobs = 7,
  from_scratch = TRUE,
  known_times = known_times
)
predict_runtime(
  config,
  jobs = 8,
  from_scratch = TRUE,
  known_times = known_times
)
# Why isn't 8 jobs any better?
# 8 would be a good guess based on the layout of the workflow graph.
# It's because of load balancing.
# Below, each row is a persistent worker.
balance <- predict_load_balancing(
  config,
  jobs = 7,
  from_scratch = TRUE,
  known_times = known_times,
  targets_only = TRUE
)
balance
max(balance$time)
# Each worker gets 2 rate-limiting targets.
balance$time
# Even if you add another worker, there will be still be workers
# with two heavy targets.
})
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples