mlr3 (version 0.1.0-9000)

Experiment: Experiment

Description

Container object for a machine learning experiment. After initialization with a Task and a Learner, the experiment is conducted by calling the methods $train(), $predict() and $score().

Arguments

Format

R6::R6Class object.

Construction

Experiment$new(task = NULL, learner = NULL, ctrl = list())
  • task :: (Task | character(1)) May be NULL during initialization, but is mandatory to train the Experiment. Instead of a Task object, it is also possible to provide a key to retrieve a task from the mlr_tasks dictionary. The task will be cloned during initialization.

  • learner :: Learner | character(1)) May be NULL during initialization, but is mandatory to train the Experiment. Instead of a Learner object, it is also possible to provide a key to retrieve a learner from the mlr_learners dictionary. The learner will be cloned during initialization.

  • ctrl :: named list() Control object, see mlr_control().

Fields

  • ctrl :: list() Control settings passed during initialization.

  • data :: named list() See section "Internal Data Storage".

  • has_errors :: named logical(2) Logical vector with names "train" and "predict" which is TRUE if any error has been recorded in the log for the respective state.

  • hash :: character(1) Hash (unique identifier) for this object.

  • model :: any Access the trained model. Only available after the experiment has been trained.

  • performance :: named numeric() Access the scored performance scores as returned by the Measure stored in the Task.

  • prediction :: Prediction Access the individual predictions of the model stored in the Learner.

  • seeds :: integer(3) Named integer of random number generator seeds passed to set.seed() prior to calling external code in train(), predict() or score(). Names must match "train", "predict" and "score". Set to NA to disable seeding (default).

  • state :: ordered(1) Returns the state of the experiment as ordered factor with levels "undefined", "defined", "trained", "predicted", and "scored".

  • task :: Task Access to the stored Task.

  • learner :: Learner Access to the stored Learner. If the experiment has been fitted, the model is stored in slot $model.

  • timings :: named numeric(3) Stores the elapsed time for the steps train(), predict() and score() in seconds with up to millisecond accuracy (c.f. proc.time()). Timings are NA if the respective step has not been performed yet.

  • train_set :: (integer() | character()) The row ids of the Task for the training set used in $train(). You can assign a vector of ids to this field. Doing so resets the experiment to the state before the training step.

  • test_set :: (integer() | character()) The row ids of the Task for the test set used in $predict() You can assign a vector of ids to this field. Doing so resets the experiment to the state before the predict step.

  • validation_set :: (integer() || character()) The row ids of the validation set of the Task. Validation sets are not yet completely integrated into the package.

Methods

  • train(row_ids = NULL, ctrl = list()) (integer() | character(), list()) -> self Fits the induced Learner on the row_ids of the Task and stores the model inside the Learner object. If no row_ids are provided, trains the model on all rows of the Task with row role "use". The fitted model can be accessed via $model.

  • predict(row_ids = NULL, newdata = NULL, ctrl = list()) (integer() | character(), data.frame(), list()) -> self Uses the previously fitted model to predict new observations. New observations are either addressed as row_ids referencing rows in the stored task, or as data.frame() via newdata. The later fuses the new observations with the stored Task, and thereby mutates the Experiment. To avoid any side effects, it is advised to clone the experiment first. The resulting predictions are stored internally as an Prediction object and can be accessed via $prediction.

  • score(measures = NULL, ctrl = list()) (list of [Measure], list()) -> self Quantifies stored predictions using the list of Measure provided here, defaulting to the default measures that come with the Task. The performance values are stored internally and can be accessed via $performance.

  • log(steps = c("train", "predict")) character(1) -> Log Returns a Log for specified steps.

  • run(ctrl = list()) list() -> self Runs the steps $train(), predict() and score().

Internal Data Storage

All data is stored in the slot data as named list(). Directly accessing the elements is not recommended, but sometimes required, especially if you aim to extend mlr3. The data object contains the following items:

  • task :: Task A clone of the Task which was provided during construction. Also accessible via e$task.

  • learner :: Learner A clone of the Learner which was provided during construction. If the experiment has already been trained, e$learner$model contains the fitted model.

  • resampling :: Resampling Is NULL prior to calling $train(). If the experiment is constructed manually (i.e., not via resample() or benchmark()), a ResamplingCustom object is stored. The combination of resampling and iteration (next item) is used to extract the training and test set indices. These are directly accessible via e$train_set and e$test_set.

  • iteration :: integer(1) Refers to the iteration number of the stored Resampling object. If the experiment is constructed manually, this is always 1, as there is only one training-test split.

  • train_log :: data.table::data.table() Log for the training step. May be NULL if no encapsulation has been enabled via mlr_control().

  • train_time :: numeric(1) Elapsed time during train in seconds with up to millisecond accuracy (c.f. proc.time()).

  • predict_log :: data.table::data.table() Log for the predict step. May be NULL if no encapsulation has been enabled via mlr_control().

  • predict_time :: numeric(1) Elapsed time during predict in seconds with up to millisecond accuracy (c.f. proc.time()).

  • prediction :: Prediction Prediction as returned by the Learner's new_prediction() method.

  • measures :: list() of Measure Measures which where used for performance assessment.

  • performance :: named numeric() Aggregated scores returned by the measures, named with measure ids.

  • score_time :: numeric(1) Elapsed time during score in seconds with up to millisecond accuracy (c.f. proc.time())..

Examples

Run this code
# NOT RUN {
e = Experiment$new(task = "iris", learner = "classif.rpart")
print(e)
e$state

e$train(row_ids = 1:120)
print(e)
e$state
e$model

e$predict(row_ids = 121:150)
print(e)
e$state
e$prediction

e$score()
print(e)
e$state
e$performance

e$train_set
e$test_set
# }

Run the code above in your browser using DataLab