Learn R Programming

mlr3hyperband (version 0.2.0)

mlr_optimizers_successive_halving: Hyperparameter Optimization with Successive Halving

Description

OptimizerSuccessiveHalving class that implements the successive halving algorithm. The algorithm samples n points and evaluates them with the smallest budget (lower bound of the budget parameter). With every stage the budget is increased by a factor of eta and only the best 1/eta points are promoted to the next stage. The optimization terminates when the maximum budget is reached (upper bound of the budget parameter).

To identify the budget, the user has to specify explicitly which parameter of the objective function influences the budget by tagging a single parameter in the search_space (paradox::ParamSet) with "budget".

Arguments

Parameters

n

integer(1) Number of points in first stage.

eta

numeric(1) With every stage, the point budget is increased by a factor of eta and only the best 1/eta points are used for the next stage. Non-integer values are supported, but eta is not allowed to be less or equal 1.

sampler

paradox::Sampler Object defining how the samples of the parameter space should be drawn during the initialization of each bracket. The default is uniform sampling.

Archive

The bbotk::Archive holds the following additional column that is specific to the successive halving algorithm:

  • stage (integer(1)) Stage index. Starts counting at 0.

Custom sampler

Hyperband supports custom paradox::Sampler object for initial configurations in each bracket. A custom sampler may look like this (the full example is given in the examples section):

# - beta distribution with alpha = 2 and beta = 5
# - categorical distribution with custom probabilities
sampler = SamplerJointIndep$new(list(
  Sampler1DRfun$new(params[[2]], function(n) rbeta(n, 2, 5)),
  Sampler1DCateg$new(params[[3]], prob = c(0.2, 0.3, 0.5))
))

Runtime

The calculation of each bracket currently assumes a linear runtime in the chosen budget parameter is always given. Hyperband is designed so each bracket requires approximately the same runtime as the sum of the budget over all configurations in each bracket is roughly the same. This will not hold true once the scaling in the budget parameter is not linear anymore, even though the sum of the budgets in each bracket remains the same. A possible adaption would be to introduce a trafo, like it is shown in the examples section.

Progress Bars

$optimize() supports progress bars via the package progressr combined with a Terminator. Simply wrap the function in progressr::with_progress() to enable them. We recommend to use package progress as backend; enable with progressr::handlers("progress").

Parallelization

In order to support general termination criteria and parallelization, we evaluate points in a batch-fashion of size batch_size. The points of one stage in a bracket are evaluated in one batch. Parallelization is supported via package future (see mlr3::benchmark()'s section on parallelization for more details).

Logging

Hyperband uses a logger (as implemented in lgr) from package bbotk. Use lgr::get_logger("bbotk") to access and control the logger.

Super class

bbotk::Optimizer -> OptimizerSuccessiveHalving

Methods

Public methods

Method new()

Creates a new instance of this R6 class.

Usage

OptimizerSuccessiveHalving$new()

Method clone()

The objects of this class are cloneable with this method.

Usage

OptimizerSuccessiveHalving$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

Run this code
# NOT RUN {
library(bbotk)
library(data.table)

search_space = domain = ps(
  x1 = p_dbl(-5, 10),
  x2 = p_dbl(0, 15),
  fidelity = p_dbl(1e-2, 1, tags = "budget")
)

# modified branin function
objective = ObjectiveRFunDt$new(
  fun = function(xdt) {
    a = 1
    b = 5.1 / (4 * (pi ^ 2))
    c = 5 / pi
    r = 6
    s = 10
    t = 1 / (8 * pi)
    data.table(y =
      (a * ((xdt[["x2"]] -
      b * (xdt[["x1"]] ^ 2L) +
      c * xdt[["x1"]] - r) ^ 2) +
      ((s * (1 - t)) * cos(xdt[["x1"]])) +
      s - (5 * xdt[["fidelity"]] * xdt[["x1"]])))
  },
  domain = domain,
  codomain = ps(y = p_dbl(tags = "minimize"))
)

instance = OptimInstanceSingleCrit$new(
  objective = objective,
  search_space = search_space,
  terminator = trm("none")
)

optimizer = opt("successive_halving")

# modifies the instance by reference
optimizer$optimize(instance)

# best scoring evaluation
instance$result

# all evaluations
as.data.table(instance$archive)
# }

Run the code above in your browser using DataLab