Collapses factors of type factor, ordered: Collapses the rarest factors in the
training samples, until target_level_count levels remain. Levels that have prevalence above no_collapse_above_prevalence
are retained, however. For factor variables, these are collapsed to the next larger level, for ordered variables,
rare variables are collapsed to the neighbouring class, whichever has fewer samples.
Levels not seen during training are not touched during prediction; Therefore it is useful to combine this with the
PipeOpFixFactors.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpCollapseFactors$new(id = "collapsefactors", param_vals = list())
id :: character(1)
Identifier of resulting object, default "collapsefactors".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with rare affected factor and ordered feature levels collapsed.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
collapse_map :: named list of named list of character
List of factor level maps. For each factor, collapse_map contains a named list that indicates what levels
of the input task get mapped to what levels of the output task. If collapse_map has an entry feat_1 with
an entry a = c("x", "y"), it means that levels "x" and "y" get collapsed to level "a" in feature "feat_1".
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
no_collapse_above_prevalence :: numeric(1)
Fraction of samples below which factor levels get collapsed. Default is 1, which causes all levels
to be collapsed until target_level_count remain.
target_level_count :: integer(1)
Number of levels to retain. Default is 2.
Makes use of the fact that levels(fact_var) = list(target1 = c("source1", "source2"), target2 = "source2") causes
renaming of level "source1" and "source2" both to "target1", and also "source2" to "target2".
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr3book.mlr-org.com/list-pipeops.html
Other PipeOps:
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreprocSimple,
PipeOpTaskPreproc,
PipeOp,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encode,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_scale,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson,
mlr_pipeops
# NOT RUN {
library("mlr3")
# }
Run the code above in your browser using DataLab