mlr3 (version 0.1.0)

benchmark: Benchmark Multiple Learners on Multiple Tasks

Description

Runs a benchmark on arbitrary combinations of learners, tasks, and resampling strategies (possibly in parallel). Resamplings which are not already instantiated will be instantiated automatically. However, these auto-instantiated resamplings will not be synchronized per task, i.e. different learners will work on different splits of the same task.

To generate exhaustive designs and automatically instantiate resampling strategies per task, use expand_grid().

Usage

benchmark(design, ctrl = list())

Arguments

design

:: data.frame() Data frame (or data.table()) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and a Resampling strategy. All resamplings must be properly instantiated. The helper function expand_grid() can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task.

ctrl

:: (named list()) Object to control learner execution. See mlr_control() for details. Note that per default, fitted learner models are discarded after the prediction in order to save some memory.

Value

BenchmarkResult.

Parallelization

This function can be parallelized with the future package. Each row in the design creates as many jobs as there are resampling iterations. All jobs are forwarded to the future package together. To select a parallel backend, use future::plan().

Examples

Run this code
# NOT RUN {
# benchmarking with expand_grid()
tasks = mlr_tasks$mget(c("iris", "sonar"))
learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))
resamplings = mlr_resamplings$mget("cv3")

design = expand_grid(tasks, learners, resamplings)
print(design)

set.seed(123)
bmr = benchmark(design)

## data of all resamplings
head(as.data.table(bmr))

## aggregated performance values
aggr = bmr$aggregate()
print(aggr)

## Extract predictions of first resampling result
rr = aggr$resample_result[[1]]
as.data.table(rr$prediction)

# benchmarking with a custom design:
# - fit classif.featureless on iris with a 3-fold CV
# - fit classif.rpart on sonar using a holdout
design = data.table::data.table(
  task = mlr_tasks$mget(c("iris", "sonar")),
  learner = mlr_learners$mget(c("classif.featureless", "classif.rpart")),
  resampling = mlr_resamplings$mget(c("cv3", "holdout"))
)

## instantiate resamplings
design$resampling = Map(
  function(task, resampling) resampling$clone()$instantiate(task),
  task = design$task, resampling = design$resampling
)

## calculate benchmark
bmr = benchmark(design)
print(bmr)

## get the training set of the 2nd iteration of the featureless learner on iris
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)
# }

Run the code above in your browser using DataLab