mlr3pipelines (version 0.1.2)

PipeOpTaskPreprocSimple: PipeOpTaskPreprocSimple

Description

Base class for handling many "preprocessing" operations that perform essentially the same operation during training and prediction. Instead implementing a $train_task() and a $predict_task() operation, only a $get_state() and a $transform() operation needs to be defined, both of which take one argument: a Task.

Alternatively, analogously to the PipeOpTaskPreproc approach of offering $train_dt()/$predict_dt(), the $get_state_dt() and $transform_dt() functions may be implemented.

$get_state must not change its input value in-place and must return something that will be written into $state (which must not be NULL), transform() should modify its argument in-place; it is called both during training and prediction.

This inherits from PipeOpTaskPreproc and behaves essentially the same.

Arguments

Format

Abstract R6Class inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpTaskPreprocSimple$new(id, param_set = ParamSet$new(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task")

(Construction is identical to PipeOpTaskPreproc.)

  • id :: character(1) Identifier of resulting object. See $id slot of PipeOp.

  • param_set :: ParamSet Parameter space description. This should be created by the subclass and given to super$initialize().

  • param_vals :: named list List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().

  • can_subset_cols :: logical(1) Whether the affect_columns parameter should be added which lets the user limit the columns that are modified by the PipeOpTaskPreprocSimple. This should generally be FALSE if the operation adds or removes rows from the Task, and TRUE otherwise. Default is TRUE.

  • packages :: character Set of all required packages for the PipeOp's $train and $predict methods. See $packages slot. Default is character(0).

  • task_type :: character(1) The class of Task that should be accepted as input and will be returned as output. This should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or "TaskRegr" (or another subclass introduced by other packages). Default is "Task".

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output during training and prediction is the Task, modified by $transform() or $transform_dt().

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc.

Internals

PipeOpTaskPreprocSimple is an abstract class inheriting from PipeOpTaskPreproc and implementing the $train_task() and $predict_task() functions. A subclass of PipeOpTaskPreprocSimple may implement the functions $get_state() and $transform(), or alternatively the functions $get_state_dt() and $transform_dt() (as well as $select_cols(), in the latter case). This works by having the default implementations of $get_state() and $transform() call $get_state_dt() and $transform_dt().

Fields

Fields inherited from PipeOp.

Methods

Methods inherited from PipeOpTaskPreproc, as well as:

  • get_state(task) (Task) -> named list Store create something that will be stored in $state during training phase of PipeOpTaskPreprocSimple. The state can then influence the $transform() function. Note that $get_state() must return the state, and should not store it in $state. It is not strictly necessary to implement either $get_state() or $get_state_dt(); if they are not implemented, the state will be stored as list(). This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with $transform(); alternatively, $get_state_dt() (optional) and $transform_dt() (and possibly $select_cols(), from PipeOpTaskPreproc) can be overloaded.

  • transform(task) (Task) -> Task Predict on new data in task, possibly using the stored $state. task should not be cloned, instead it should be changed in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit from PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.) This method can be overloaded when inheriting from PipeOpTaskPreprocSimple, optionally with $get_state(); alternatively, $get_state_dt() (optional) and $transform_dt() (and possibly $select_cols(), from PipeOpTaskPreproc) can be overloaded.

  • get_state_dt(dt) (data.table) -> named list Create something that will be stored in $state during training phase of PipeOpTaskPreprocSimple. The state can then influence the $transform_dt() function. Note that $get_state_dt() must return the state, and should not store it in $state. If neither $get_state() nor $get_state_dt() are overloaded, the state will be stored as list(). This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with $transform_dt() (and optionally $select_cols(), from PipeOpTaskPreproc); Alternatively, $get_state() (optional) and $transform() can be overloaded.

  • transform_dt(dt) (data.table) -> data.table | data.frame | matrix Predict on new data in dt, possibly using the stored $state. A transformed object must be returned that can be converted to a data.table using as.data.table. dt does not need to be copied deliberately, it is possible and encouraged to change it in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit from PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.) This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with $transform_dt() (and optionally $select_cols(), from PipeOpTaskPreproc); Alternatively, $get_state() (optional) and $transform() can be overloaded.

See Also

Other mlr3pipelines backend related: Graph, PipeOpTaskPreproc, PipeOp, mlr_pipeops