predict_load_balancing: Predict the load balancing of the next call to `make()` for non-staged parallel backends.

Description

Take the past recorded runtimes times from build_times() and use them to predict how the targets will be distributed among the available workers in the next make().

Usage

predict_load_balancing(config = drake::read_drake_config(),
  targets = NULL, from_scratch = FALSE, targets_only = FALSE,
  future_jobs = NULL, digits = NULL, jobs = 1,
  known_times = numeric(0), default_time = 0, warn = TRUE)

Arguments

config

option internal runtime parameter list of produced by both make() and drake_config().

targets

Character vector, names of targets. Predict the runtime of building these targets plus dependencies. Defaults to all targets.

from_scratch

logical, whether to predict a make() build from scratch or to take into account the fact that some targets may be already up to date and therefore skipped.

targets_only

logical, whether to factor in just the targets into the calculations or use the build times for everything, including the imports

future_jobs

deprecated

digits

deprecated

jobs

the jobs argument of your next planned make(). How many targets to do you plan to have running simultaneously?

known_times

a named numeric vector with targets/imports as names and values as hypothetical runtimes in seconds. Use this argument to overwrite any of the existing build times or the default_time.

default_time

number of seconds to assume for any target or import with no recorded runtime (from build_times()) or anything in known_times.

warn

logical, whether to warn the user about any targets with no available runtime, either in known_times or build_times(). The times for these targets default to default_time.

Value

A list with (1) the total runtime and (2) a list of the names of the targets assigned to each worker. For each worker, targets are listed in the order they are assigned.

Details

The prediction is only a rough approximation. The algorithm that emulates the workers is not perfect, and it may turn out to perform poorly in some edge cases. It assumes you are using one of the backends with persistent workers ("mclapply", "parLapply", or "future_lapply"), though the transient worker backends "future" and "Makefile" should be similar. The prediction does not apply to staged parallelism backends such as make(parallelism = "mclapply_staged") or make(parallelism = "parLapply_staged"). The function also assumes that the overhead of initializing make() and any workers is negligible. Use the default_time and known_times arguments to adjust the assumptions as needed.

Examples

Run this code

# NOT RUN {
test_with_dir("Quarantine side effects.", {
load_mtcars_example() # Get the code with drake_example("mtcars").
config <- make(my_plan) # Run the project, build the targets.
known_times <- c(5, rep(7200, nrow(my_plan) - 1))
names(known_times) <- c(file_store("report.md"), my_plan$target[-1])
known_times
# Predict the runtime
predict_runtime(
  config,
  jobs = 7,
  from_scratch = TRUE,
  known_times = known_times
)
predict_runtime(
  config,
  jobs = 8,
  from_scratch = TRUE,
  known_times = known_times
)
balance <- predict_load_balancing(
  config,
  jobs = 7,
  from_scratch = TRUE,
  known_times = known_times,
  targets_only = TRUE
)
balance
})
# }

Run the code above in your browser using DataLab