TuningInstance: TuningInstance Class

Description

Specifies a general tuning scenario, including performance evaluator and archive for Tuners to act upon. This class encodes the black box objective function, that a Tuner has to optimize. It allows the basic operations of querying the objective at design points ($eval_batch()), storing the evaluations in an internal archive and querying the archive ($archive()).

Evaluations of hyperparameter configurations are performed in batches by calling mlr3::benchmark() internally. Before and after a batch is evaluated, the Terminator is queried for the remaining budget. If the available budget is exhausted, an exception is raised, and no further evaluations can be performed from this point on.

A list of measures can be passed to the instance, and they will always be all evaluated. However, single-criteria tuners optimize only the first measure.

The tuner is also supposed to store its final result, consisting of a selected hyperparameter configuration and associated estimated performance values, by calling the method instance$assign_result.

Arguments

Format

R6::R6Class object.

Construction

inst = TuningInstance$new(task, learner, resampling, measures,
  param_set, terminator, bm_args = list())

This defines the resampled performance of a learner on a task, a feasibility region for the parameters the tuner is supposed to optimize, and a termination criterion.

task :: mlr3::Task.
learner :: mlr3::Learner.
resampling :: mlr3::Resampling Note that the resampling is instantiated at the beginning so that all configurations are evaluated on the same data splits.
measures :: list of mlr3::Measure.
param_set :: paradox::ParamSet.
terminator :: Terminator.
bm_args :: named list() Further arguments for mlr3::benchmark().

Fields

task :: mlr3::Task; from construction.
learner :: mlr3::Learner; from construction.
resampling :: mlr3::Resampling; from construction.
measures :: list of mlr3::Measure; from construction.
param_set :: paradox::ParamSet; from construction.
terminator :: Terminator; from construction.
bmr :: mlr3::BenchmarkResult A benchmark result, container object for all performed mlr3::ResampleResults when evaluating hyperparameter configurations.
n_evals :: integer(1) Number of configuration evaluations stored in the container.
start_time :: POSIXct(1) Time the tuning was started. This is set in the beginning of $tune() of Tuner.
result :: named list() Result of the tuning, i.e., the optimal configuration and its estimated performance:
- "perf": Named vector of estimated performance values of the best configuration found.
- "tune_x": Named list of optimal hyperparameter settings, without potential trafo function applied.
- "params": Named list of optimal hyperparameter settings, similar to tune_x, but with potential trafo function applied. Also, if the learner had some extra parameters statically set before tuning, these are included here.

Methods

eval_batch(dt) data.table::data.table() -> named list() Evaluates all hyperparameter configurations in dt through resampling, where each configuration is a row, and columns are scalar parameters. Updates the internal BenchmarkResult $bmr by reference, and returns a named list with the following elements:
- "batch_nr": Number of the new batch. This number is calculated in an auto-increment fashion and also stored inside the BenchmarkResult as column batch_nr
- "uhashes": unique hashes of the added ResampleResults.
- "perf": A data.table::data.table() of evaluated performances for each row of the dt. Has the same number of rows as dt, and the same number of columns as length of measures. Columns are named with measure-IDs. A cell entry is the (aggregated) performance of that configuration for that measure.

Before and after each batch-evaluation, the Terminator is checked, and if it is positive, an exception of class terminated_error is raised. This function should be internally called by the tuner.

tuner_objective(x) numeric() -> numeric(1) Evaluates a hyperparameter configuration (untransformed) of only numeric values, and returns a scalar objective value, where the return value is negated if the measure is maximized. Internally, $eval_batch() is called with a single row. This function serves as a objective function for tuners of numeric spaces - which should always be minimized.

best(measure = NULL) (mlr3::Measure, character(1)) -> mlr3::ResampleResult Queries the mlr3::BenchmarkResult for the best mlr3::ResampleResult according to measure (default is the first measure in $measures). In case of ties, one of the tied values is selected randomly.

archive(unnest = "no") character(1) -> data.table::data.table() Returns a table of contained resample results, similar to the one returned by mlr3::benchmark()'s $aggregate() method. Some interesting columns of this table are:

All evaluated measures are included as numeric columns, named with their measure ID.
tune_x: A list column that contains the parameter settings the tuner evaluated, without potential trafo applied.
params: A list column that contains the parameter settings that were actually used in the learner. Similar to tune_x, but with potential trafo applied. Also, if the learner had some extra parameters statically set before tuning, these are included here. unnest can have the values "no", "tune_x" or "params". If it is not set to "no", settings of the respective list-column are stored in separate columns instead of the list-column, and dependent, inactive parameters are encoded with NA.

assign_result(tune_x, perf) (list, numeric) -> NULL The tuner writes the best found list of settings and estimated performance values here. For internal use.

tune_x: Must be a named list of settings only of parameters from param_set and be feasible, untransformed.
perf : Must be a named numeric vector of performance measures, named with performance IDs, regarding all elements in measures.

Examples

Run this code

# NOT RUN {
library(data.table)
library(paradox)
library(mlr3)

# Objects required to define the performance evaluator:
task = tsk("iris")
learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measures = msr("classif.ce")
param_set = ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10))
)

terminator = term("evals", n_evals = 5)
inst = TuningInstance$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measures = measures,
  param_set = param_set,
  terminator = terminator
)

# first 4 points as cross product
design = CJ(cp = c(0.05, 0.01), minsplit = c(5, 3))
inst$eval_batch(design)
inst$archive()

# try more points, catch the raised terminated message
tryCatch(
  inst$eval_batch(data.table(cp = 0.01, minsplit = 7)),
  terminated_error = function(e) message(as.character(e))
)

# try another point although the budget is now exhausted
# -> no extra evaluations
tryCatch(
  inst$eval_batch(data.table(cp = 0.01, minsplit = 9)),
  terminated_error = function(e) message(as.character(e))
)

inst$archive()

### Error handling
# get a learner which breaks with 50% probability
# set encapsulation + fallback
learner = lrn("classif.debug", error_train = 0.5)
learner$encapsulate = c(train = "evaluate", predict = "evaluate")
learner$fallback = lrn("classif.featureless")

param_set = ParamSet$new(list(
  ParamDbl$new("x", lower = 0, upper = 1)
))

inst = TuningInstance$new(
  task = tsk("wine"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce"),
  param_set = param_set,
  terminator = term("evals", n_evals = 5)
)

tryCatch(
  inst$eval_batch(data.table(x = 1:5 / 5)),
  terminated_error = function(e) message(as.character(e))
)

archive = inst$archive()

# column errors: multiple errors recorded
print(archive)
# }

Run the code above in your browser using DataLab