mlr_pipeops_colapply
PipeOpColApply
Applies a function to each column of a task. Use the affect_columns
parameter inherited from
PipeOpTaskPreproc
to limit the columns this function should be applied to. This can be used
for simple parameter transformations or type conversions (e.g. as.numeric
).
The same function is applied during training and prediction. One important relationship for
machine learning preprocessing is that during the prediction phase, the preprocessing on each
data row should be independent of other rows. Therefore, the applicator
function should always
return a vector / list where each result component only depends on the corresponding input component and
not on other components. As a rule of thumb, if the function f
generates output different
from Vectorize(f)
, it is not a function that should be used for applicator
.
- Keywords
- datasets
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
PipeOpColApply$new(id = "colapply", param_vals = list())
id
::character(1)
Identifier of resulting object, default"colapply"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with features changed according to the applicator
parameter.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
emptydt
::data.table
An emptydata.table
with columns of names and types from output features after training. This is used to produce a correct type conversion during prediction, even when the input has zero length andapplicator
is therefore not called.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
applicator
::function
Function to apply to each column of the task. The return value must have the same length as the input, i.e. vectorize over the input. A typical example would beas.numeric
. UseVectorize
to create a vectorizing function from any function that ordinarily only takes one element input. Theapplicator
is not called during prediction if the input task has no rows; instead the types of affected features are changed to the result types of theapplicator
call during training. Initialized to theidentity()
-function.
Internals
PipeOpColApply
can not inherit from PipeOpTaskPreprocSimple
, because if applicator
is given
and the prediction data has 0 rows, then the resulting data.table
does not know
what the column types should be. Column type conformity between training and prediction is enforced
by simply saving a copy of an empty data.table
in the $state$emptydt
slot.
Fields
Only fields inherited from PipeOpTaskPreproc
/PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
See Also
Other PipeOps: PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTaskPreproc
, PipeOp
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_copy
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encode
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputenewlvl
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_mutate
,
mlr_pipeops_nop
,
mlr_pipeops_pca
,
mlr_pipeops_quantilebin
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_scale
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_unbranch
,
mlr_pipeops_yeojohnson
,
mlr_pipeops
Examples
# NOT RUN {
library("mlr3")
task = tsk("iris")
poca = po("colapply", applicator = as.character)
poca$train(list(task))[[1]] # types are converted
# function that does not vectorize
f = function(x) {
# we could use `ifelse` here, but that is not the point
if (x > 1) {
"a"
} else {
"b"
}
}
poca$param_set$values$applicator = Vectorize(f)
poca$train(list(task))[[1]]$data()
# only affect Petal.* columns:
poca$param_set$values$affect_columns = selector_grep("^Petal")
poca$train(list(task))[[1]]$data()
# }