BenchmarkResult: Container for Results of `benchmark()`

Description

This is the result container object returned by benchmark().

Note that all stored objects are accessed by reference. Do not modify any object without cloning it first.

Arguments

Format

Construction

bmr = BenchmarkResult$new(data = data.table())

data :: data.table::data.table() Table with data for one resampling iteration per row: Task, Learner, Resampling, iteration (integer(1)), Prediction, and the hash (character(1)) of the corresponding ResampleResult.

Fields

data :: data.table::data.table() Internal data storage. We discourage users to directly work with this field.
task_type :: character(1) Task type of objects in the BenchmarkResult. All stored objects (Task, Learner, Prediction) in a single BenchmarkResult are required to have the same task type, e.g., "classif" or "regr".
tasks :: data.table::data.table() Table of used tasks with three columns: "task_hash" (character(1)), "task_id" (character(1)) and "task" (Task).
learners :: data.table::data.table() Table of used learners with three columns: "learner_hash" (character(1)), "learner_id" (character(1)) and "learner" (Learner).
resamplings :: data.table::data.table() Table of used resamplings with three columns: "resampling_hash" (character(1)), "resampling_id" (character(1)) and "resampling" (Resampling).
n_resample_results :: integer(1) Returns the number of stored ResampleResults.
hashes :: character() Vector of hashes of all included ResampleResults.

Methods

aggregate(measures = NULL, ids = TRUE, params = FALSE, warnings = FALSE, errors = FALSE) (list of Measure, logical(1), logical(1), logical(1), logical(1)) -> data.table::data.table() Returns a result table where resampling iterations are aggregated together into ResampleResults. A column with the aggregated performance is added for each Measure, named with the id of the respective measure.

Additional arguments control the number of additional columns:
- ids :: logical(1) Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns.
- params :: logical(1) Adds the hyperparameter values as extra list column "params". You can unnest them with mlr3misc::unnest().
- warnings :: logical(1) Adds the number of resampling iterations with at least one warning as extra integer column "warnings".
- errors :: logical(1) Adds the number of resampling iterations with errors as extra integer column "errors".
performance(measures = NULL, ids = TRUE) (list of Measure, logical(1)) -> data.table::data.table() Returns a table with one row for each resampling iteration, including all involved objects: Task, Learner, Resampling, iteration number (integer(1)), and Prediction. If ids is set to TRUE, character column of extracted ids are added to the table for convenient filtering: "task_id", "learner_id", and "resampling_id". Additionally calculates the provided performance measures and binds the performance as extra columns. These columns are named using the id of the respective Measure.
resample_result(i) (integer(1) -> ResampleResult) Retrieve the i-th ResampleResult.
combine(bmr) (BenchmarkResult | NULL) -> self Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place. If bmr is NULL, simply returns self.

In case of duplicated ResampleResults, an exception is raised. Two ResampleResults are identical iff the hashes of the respective Task, Learner and Resampling are identical. I.e., they must operate on the exactly same data, with the same learner with the same hyperparameters and the same splits into training and test sets.

S3 Methods

as.data.table(bmr) BenchmarkResult -> data.table::data.table() Returns a copy of the internal data.

Examples

Run this code

# NOT RUN {
set.seed(123)
learners = list(
  lrn("classif.featureless", predict_type = "prob"),
  lrn("classif.rpart", predict_type = "prob")
)

design = benchmark_grid(
  tasks = list(tsk("sonar"), tsk("spam")),
  learners = learners,
  resamplings = rsmp("cv", folds = 3)
)
print(design)

bmr = benchmark(design)
print(bmr)

bmr$tasks
bmr$learners

# first 5 individual resamplings
head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)

# aggregate results
bmr$aggregate()

# aggregate results with hyperparameters as separate columns
mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")

# extract resample result for classif.rpart
rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]]
print(rr)

# access the confusion matrix of the first resampling iteration
rr$data$prediction[[1]]$confusion
# }

Run the code above in your browser using DataLab