Resampling: Resampling Class

Description

This is the abstract base class for resampling objects like ResamplingCV and ResamplingBootstrap.

The objects of this class define how a task is partitioned for resampling (e.g., in resample() or benchmark()), using a set of hyperparameters such as the number of folds in cross-validation.

Resampling objects can be instantiated on a Task, which applies the strategy on the task and manifests in a fixed partition of row_ids of the Task.

Predefined resamplings are stored in the Dictionary mlr_resamplings, e.g. cv or bootstrap.

Arguments

Format

R6::R6Class object.

Construction

Note: This object is typically constructed via a derived classes, e.g. ResamplingCV or ResamplingHoldout.

r = Resampling$new(id, param_set, param_vals)

id :: character(1) Identifier for the resampling strategy.
param_set :: paradox::ParamSet Set of hyperparameters.
param_vals :: named list() List of hyperparameter settings.

Fields

id :: character(1) Identifier of the learner.
param_set :: paradox::ParamSet Description of available hyperparameters and hyperparameter settings.
hash :: character(1) Hash (unique identifier) for this object.
instance :: any During instantiate(), the instance is stored in this slot. The instance can be in any arbitrary format.
is_instantiated :: logical(1) Is TRUE, if the resampling has been instantiated.
duplicated_ids :: logical(1) Is TRUE if this resampling strategy may have duplicated row ids in a single training set or test set. E.g., this is TRUE for Bootstrap, and FALSE for cross validation.
iters :: integer(1) Return the number of resampling iterations, depending on the values stored in the param_set.
task_hash :: character(1) The hash of the task which was passed to r$instantiate().

Methods

instantiate(task) Task -> self Materializes fixed training and test splits for a given task and stores them in r$instance.
train_set(i) integer(1) -> (integer() | character()) Returns the row ids of the i-th training set.
test_set(i) integer(1) -> (integer() | character()) Returns the row ids of the i-th test set.

Examples

Run this code

# NOT RUN {
r = mlr_resamplings$get("subsampling")

# Default parametrization
r$param_set$values

# Do only 3 repeats on 10% of the data
r$param_set$values = list(ratio = 0.1, repeats = 3)
r$param_set$values

# Instantiate on iris task
task = mlr_tasks$get("iris")
r$instantiate(task)

# Extract train/test sets
train_set = r$train_set(1)
print(train_set)
intersect(train_set, r$test_set(1))

# Another example: 10-fold CV
r = mlr_resamplings$get("cv")$instantiate(task)
r$train_set(1)

# Stratification
task = mlr_tasks$get("pima")
prop.table(table(task$truth())) # moderately unbalanced

r = mlr_resamplings$get("subsampling")
r$instantiate(task)
prop.table(table(task$truth(r$train_set(1)))) # roughly same proportion
# }

Run the code above in your browser using DataLab