Method new()
Creates a new instance of this R6 class.
Usage
BenchmarkResult$new(data = NULL)
Arguments
data
(ResultData
)
An object of type ResultData
, either extracted from another ResampleResult, another
BenchmarkResult, or manually constructed with as_result_data()
.
Opens the help page for this object.
Usage
BenchmarkResult$help()
Printer.
Usage
BenchmarkResult$print()
Method combine()
Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place.
If the second BenchmarkResult bmr
is NULL
, simply returns self
.
Note that you can alternatively use the combine function c()
which calls this method internally.
Usage
BenchmarkResult$combine(bmr)
Arguments
bmr
(BenchmarkResult)
A second BenchmarkResult object.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keep
the object in its previous state.
Method marshal()
Marshals all stored models.
Usage
BenchmarkResult$marshal(...)
Arguments
...
(any)
Additional arguments passed to marshal_model()
.
Method unmarshal()
Unmarshals all stored models.
Usage
BenchmarkResult$unmarshal(...)
Arguments
...
(any)
Additional arguments passed to unmarshal_model()
.
Method score()
Returns a table with one row for each resampling iteration, including
all involved objects: Task, Learner, Resampling, iteration number
(integer(1)
), and Prediction. If ids
is set to TRUE
, character
column of extracted ids are added to the table for convenient
filtering: "task_id"
, "learner_id"
, and "resampling_id"
.
Additionally calculates the provided performance measures and binds the
performance scores as extra columns. These columns are named using the id of
the respective Measure.
Usage
BenchmarkResult$score(
measures = NULL,
ids = TRUE,
conditions = FALSE,
predictions = TRUE
)
Arguments
measures
(Measure | list of Measure)
Measure(s) to calculate.
ids
(logical(1)
)
Adds object ids ("task_id"
, "learner_id"
, "resampling_id"
) as
extra character columns to the returned table.
conditions
(logical(1)
)
Adds condition messages ("warnings"
, "errors"
) as extra
list columns of character vectors to the returned table
predictions
(logical(1)
)
Additionally return prediction objects, one column for each predict_set
of all learners combined.
Columns are named "prediction_train"
, "prediction_test"
and "prediction_internal_valid"
,
if present.
Method obs_loss()
Calculates the observation-wise loss via the loss function set in the
Measure's field obs_loss
.
Returns a data.table()
with the columns row_ids
, truth
, response
and
one additional numeric column for each measure, named with the respective measure id.
If there is no observation-wise loss function for the measure, the column is filled with
NA
values.
Note that some measures such as RMSE, do have an $obs_loss
, but they require an
additional transformation after aggregation, in this example taking the square-root.
Usage
BenchmarkResult$obs_loss(measures = NULL, predict_sets = "test")
Arguments
measures
(Measure | list of Measure)
Measure(s) to calculate.
predict_sets
(character()
)
The predict sets.
Returns a result table where resampling iterations are combined into
ResampleResults. A column with the aggregated performance score is
added for each Measure, named with the id of the respective measure.
The method for aggregation is controlled by the Measure, e.g. micro
aggregation, macro aggregation or custom aggregation. Most measures
default to macro aggregation.
Note that the aggregated performances just give a quick impression which
approaches work well and which approaches are probably underperforming.
However, the aggregates do not account for variance and cannot replace
a statistical test.
See mlr3viz to get a better impression via boxplots or
mlr3benchmark for critical difference plots and
significance tests.
For convenience, different flags can be set to extract more
information from the returned ResampleResult.
Usage
BenchmarkResult$aggregate(
measures = NULL,
ids = TRUE,
uhashes = FALSE,
params = FALSE,
conditions = FALSE
)
Arguments
measures
(Measure | list of Measure)
Measure(s) to calculate.
ids
(logical(1)
)
Adds object ids ("task_id"
, "learner_id"
, "resampling_id"
) as
extra character columns for convenient subsetting.
uhashes
(logical(1)
)
Adds the uhash values of the ResampleResult as extra character
column "uhash"
.
params
(logical(1)
)
Adds the hyperparameter values as extra list column "params"
. You
can unnest them with mlr3misc::unnest()
.
conditions
(logical(1)
)
Adds the number of resampling iterations with at least one warning as
extra integer column "warnings"
, and the number of resampling
iterations with errors as extra integer column "errors"
.
Subsets the benchmark result.
You can either directly provide the row IDs or the uhashes of the resample results to keep,
or use the learner_ids
, task_ids
and resampling_ids
arguments to filter for learner, task and resampling IDs.
The three options are mutually exclusive.
Usage
BenchmarkResult$filter(
i = NULL,
uhashes = NULL,
learner_ids = NULL,
task_ids = NULL,
resampling_ids = NULL
)
Arguments
i
(integer()
| NULL
)
The iteration values to filter for.
uhashes
(character()
| NULL
)
The uhashes of the resample results to filter for.
learner_ids
(character()
| NULL
)
The learner IDs to filter for.
task_ids
(character()
| NULL
)
The task IDs to filter for.
resampling_ids
(character()
| NULL
)
The resampling IDs to filter for.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
Examples
design = benchmark_grid(
tsks(c("iris", "sonar")),
lrns(c("classif.debug", "classif.featureless")),
rsmp("holdout")
)
bmr = benchmark(design)
bmr
bmr2 = bmr$clone(deep = TRUE)
bmr2$filter(learner_ids = "classif.featureless")
bmr2
Method resample_result()
Retrieve the i-th ResampleResult, by position, by unique hash uhash
or by learner,
task and resampling IDs.
All three options are mutually exclusive.
Usage
BenchmarkResult$resample_result(
i = NULL,
uhash = NULL,
task_id = NULL,
learner_id = NULL,
resampling_id = NULL
)
Arguments
i
(integer(1)
| NULL
)
The iteration value to filter for.
uhash
(character(1)
| NULL
)
The unique identifier to filter for.
task_id
(character(1)
| NULL
)
The task ID to filter for.
learner_id
(character(1)
| NULL
)
The learner ID to filter for.
resampling_id
(character(1)
| NULL
)
The resampling ID to filter for.
Examples
design = benchmark_grid(
tsk("iris"),
lrns(c("classif.debug", "classif.featureless")),
rsmp("holdout")
)
bmr = benchmark(design)
bmr$resample_result(learner_id = "classif.featureless")
bmr$resample_result(i = 1)
bmr$resample_result(uhash = uhashes(bmr, learner_id = "classif.debug"))
Method discard()
Shrinks the BenchmarkResult by discarding parts of the internally stored data.
Note that certain operations might stop work, e.g. extracting
importance values from learners or calculating measures requiring the task's data.
Usage
BenchmarkResult$discard(backends = FALSE, models = FALSE)
Arguments
backends
(logical(1)
)
If TRUE
, the DataBackend is removed from all stored Tasks.
models
(logical(1)
)
If TRUE
, the stored model is removed from all Learners.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
Method set_threshold()
Sets the threshold for the response prediction of classification learners, given they have
output a probability prediction for a binary classification task.
The resample results for which to change the threshold can either be specified directly
via uhashes
, by selecting the specific iterations (i
) or by filtering according to
learner, task and resampling IDs.
If none of the three options is specified, the threshold is set for all resample results.
Usage
BenchmarkResult$set_threshold(
threshold,
i = NULL,
uhashes = NULL,
learner_ids = NULL,
task_ids = NULL,
resampling_ids = NULL,
ties_method = "random"
)
Arguments
threshold
(numeric(1)
)
Threshold value.
i
(integer()
| NULL
)
The iteration values to filter for.
uhashes
(character()
| NULL
)
The unique identifiers of the ResampleResults for which the threshold should be set.
learner_ids
(character()
| NULL
)
The learner IDs for which the threshold should be set.
task_ids
(character()
| NULL
)
The task IDs for which the threshold should be set.
resampling_ids
(character()
| NULL
)
The resampling IDs for which the threshold should be set.
ties_method
(character(1)
)
Method to handle ties in probabilities when selecting a class label.
Must be one of "random"
, "first"
or "last"
(corresponding to the same options in max.col()
).
"random"
: Randomly select one of the tied class labels (default).
"first"
: Select the first class label among tied values.
"last"
: Select the last class label among tied values.
Examples
design = benchmark_grid(
tsk("sonar"),
lrns(c("classif.debug", "classif.featureless"), predict_type = "prob"),
rsmp("holdout")
)
bmr = benchmark(design)
bmr$set_threshold(0.8, learner_ids = "classif.featureless")
bmr$set_threshold(0.3, i = 2)
bmr$set_threshold(0.7, uhashes = uhashes(bmr, learner_ids = "classif.featureless"))
Method clone()
The objects of this class are cloneable with this method.
Usage
BenchmarkResult$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.