mlr_pipeops_colapply: PipeOpColApply

Description

Applies a function to each column of a task. Use the affect_columns parameter inherited from PipeOpTaskPreproc to limit the columns this function should be applied to. This can be used for simple parameter transformations or type conversions (e.g. as.numeric).

The same function is applied during training and prediction. One important relationship for machine learning preprocessing is that during the prediction phase, the preprocessing on each data row should be independent of other rows. Therefore, the applicator function should always return a vector / list where each result component only depends on the corresponding input component and not on other components. As a rule of thumb, if the function f generates output different from Vectorize(f), it is not a function that should be used for applicator.

Arguments

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpColApply$new(id = "colapply", param_vals = list())

id :: character(1) Identifier of resulting object, default "colapply".
param_vals :: named list List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with features changed according to the applicator parameter.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:

emptydt :: data.table An empty data.table with columns of names and types from output features after training. This is used to produce a correct type conversion during prediction, even when the input has zero length and applicator is therefore not called.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

applicator :: function Function to apply to each column of the task. The return value must have the same length as the input, i.e. vectorize over the input. A typical example would be as.numeric. Use Vectorize to create a vectorizing function from any function that ordinarily only takes one element input. The applicator is not called during prediction if the input task has no rows; instead the types of affected features are changed to the result types of the applicator call during training. Initialized to the identity()-function.

Internals

PipeOpColApply can not inherit from PipeOpTaskPreprocSimple, because if applicator is given and the prediction data has 0 rows, then the resulting data.table does not know what the column types should be. Column type conformity between training and prediction is enforced by simply saving a copy of an empty data.table in the $state$emptydt slot.

Fields

Only fields inherited from PipeOpTaskPreproc/PipeOp.

Methods

Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_collapsefactors, mlr_pipeops_copy, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputehist, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputenewlvl, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_mutate, mlr_pipeops_nop, mlr_pipeops_pca, mlr_pipeops_quantilebin, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_unbranch, mlr_pipeops_yeojohnson, mlr_pipeops

Examples

Run this code

# NOT RUN {
library("mlr3")

task = tsk("iris")
poca = po("colapply", applicator = as.character)
poca$train(list(task))[[1]]  # types are converted

# function that does not vectorize
f = function(x) {
  # we could use `ifelse` here, but that is not the point
  if (x > 1) {
    "a"
  } else {
    "b"
  }
}
poca$param_set$values$applicator = Vectorize(f)
poca$train(list(task))[[1]]$data()

# only affect Petal.* columns:
poca$param_set$values$affect_columns = selector_grep("^Petal")
poca$train(list(task))[[1]]$data()
# }

Run the code above in your browser using DataLab