This is the result container object returned by benchmark()
.
A BenchmarkResult consists of the data row-binded data of multiple
ResampleResults, which can easily be re-constructed.
Note that all stored objects are accessed by reference. Do not modify any object without cloning it first.
as.data.table(bmr)
BenchmarkResult -> data.table::data.table()
Returns a copy of the internal data.
c(...)
(BenchmarkResult, ...) -> BenchmarkResult
Combines multiple objects convertible to BenchmarkResult into a new BenchmarkResult.
friedman.test(y, ...)
BenchmarkResult -> "htest"
Applies friedman.test()
on the benchmark result, returning an
object of class "htest"
.
data
(data.table::data.table()
)
Internal data storage with one row per resampling iteration.
Can be joined with $rr_data
by joining on column "hash"
.
We discourage users to directly work with this table.
rr_data
(data.table::data.table()
)
Internal data storage with one row per ResampleResult
(instead of one row per resampling iteration as in $data
).
Package develops may opt to add additional columns here. These columns are preserved in all mutators.
Can be combined with $data
by (left) joining on the key column "hash"
.
E.g., mlr3tuning stores additional information for the optimization path
in this table.
task_type
(character(1)
)
Task type of objects in the BenchmarkResult
.
All stored objects (Task, Learner, Prediction) in a single BenchmarkResult
are
required to have the same task type, e.g., "classif"
or "regr"
.
This is NULL
for empty BenchmarkResults.
tasks
(data.table::data.table()
)
Table of included Tasks with three columns:
"task_hash"
(character(1)
),
"task_id"
(character(1)
), and
"task"
(Task).
learners
(data.table::data.table()
)
Table of included Learners with three columns:
"learner_hash"
(character(1)
),
"learner_id"
(character(1)
), and
"learner"
(Learner).
Note that it is not feasible to access learned models via this getter, as the training task would be ambiguous.
For this reason the returned learner are reseted before they are returned.
Instead, select a row from the table returned by $score()
.
resamplings
(data.table::data.table()
)
Table of included Resamplings with three columns:
"resampling_hash"
(character(1)
),
"resampling_id"
(character(1)
), and
"resampling"
(Resampling).
n_resample_results
(integer(1)
)
Returns the total number of stored ResampleResults.
uhashes
(character()
)
Set of (unique) hashes of all included ResampleResults.
new()
Creates a new instance of this R6 class.
BenchmarkResult$new(data = data.table())
data
(data.table::data.table()
)
Table with data for one resampling iteration per row, with at least the following columns:
"task"
(Task),
"learner"
(Learner),
"resampling"
(Resampling),
"iteration"
(integer(1)
),
"prediction"
(Prediction), and
"uhash"
(character(1)
).
Column "uhash"
is the unique hash of the corresponding ResampleResult.
Additional columns are kept in the resulting object, but otherwise ignored by BenchmarkResult.
help()
Opens the help page for this object.
BenchmarkResult$help()
format()
Helper for print outputs.
BenchmarkResult$format()
print()
Printer.
BenchmarkResult$print()
combine()
Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place.
If the second BenchmarkResult bmr
is NULL
, simply returns self
.
Note that you can alternatively use the combine function c()
which calls this method internally.
BenchmarkResult$combine(bmr)
bmr
(BenchmarkResult) A second BenchmarkResult object.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
score()
Returns a table with one row for each resampling iteration, including
all involved objects: Task, Learner, Resampling, iteration number
(integer(1)
), and Prediction. If ids
is set to TRUE
, character
column of extracted ids are added to the table for convenient
filtering: "task_id"
, "learner_id"
, and "resampling_id"
.
Additionally calculates the provided performance measures and binds the performance scores as extra columns. These columns are named using the id of the respective Measure.
BenchmarkResult$score(measures = NULL, ids = TRUE)
ids
(logical(1)
)
Adds object ids ("task_id"
, "learner_id"
, "resampling_id"
) as
extra character columns for convenient subsetting.
aggregate()
Returns a result table where resampling iterations are combined into ResampleResults. A column with the aggregated performance score is added for each Measure, named with the id of the respective measure.
For convenience, different flags can be set to extract more information from the returned ResampleResult:
BenchmarkResult$aggregate( measures = NULL, ids = TRUE, uhashes = FALSE, params = FALSE, conditions = FALSE )
ids
(logical(1)
)
Adds object ids ("task_id"
, "learner_id"
, "resampling_id"
) as
extra character columns for convenient subsetting.
uhashes
(logical(1)
)
Adds the uhash values of the ResampleResult as extra character
column "uhash"
.
params
(logical(1)
)
Adds the hyperparameter values as extra list column "params"
. You
can unnest them with mlr3misc::unnest()
.
conditions
(logical(1)
)
Adds the number of resampling iterations with at least one warning as
extra integer column "warnings"
, and the number of resampling
iterations with errors as extra integer column "errors"
.
filter()
Subsets the benchmark result. If task_ids
is not NULL
, keeps all
tasks with provided task ids while discards all others. Same procedure
for learner_ids
and resampling_ids
.
BenchmarkResult$filter( task_ids = NULL, learner_ids = NULL, resampling_ids = NULL )
task_ids
(character()
)
Ids of Tasks to keep.
learner_ids
(character()
)
Ids of Learners to keep.
resampling_ids
(character()
)
Ids of Resamplings to keep.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
resample_result()
Retrieve the i-th ResampleResult, by position or by unique hash uhash
.
i
and uhash
are mutually exclusive.
BenchmarkResult$resample_result(i = NULL, uhash = NULL)
i
(integer(1)
)
The iteration value to filter for.
uhash
(logical(1)
)
The ushash
value to filter for.
clone()
The objects of this class are cloneable with this method.
BenchmarkResult$clone(deep = FALSE)
deep
Whether to make a deep clone.
# NOT RUN {
set.seed(123)
learners = list(
lrn("classif.featureless", predict_type = "prob"),
lrn("classif.rpart", predict_type = "prob")
)
design = benchmark_grid(
tasks = list(tsk("sonar"), tsk("spam")),
learners = learners,
resamplings = rsmp("cv", folds = 3)
)
print(design)
bmr = benchmark(design)
print(bmr)
bmr$tasks
bmr$learners
# first 5 individual resamplings
head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)
# aggregate results
bmr$aggregate()
# aggregate results with hyperparameters as separate columns
mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")
# extract resample result for classif.rpart
rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]]
print(rr)
# access the confusion matrix of the first resampling iteration
rr$predictions()[[1]]$confusion
# }
Run the code above in your browser using DataLab