
Last chance! 50% off unlimited learning
Sale ends in
Encodes columns of type factor
, character
and ordered
.
Possible encodings are "one-hot"
encoding, as well as encoding according to stats::contr.helmert()
, stats::contr.poly()
,
stats::contr.sum()
and stats::contr.treatment()
.
Newly created columns are named via pattern [column-name].[x]
where x
is the respective factor level for "one-hot"
and
"treatment"
encoding, and an integer sequence otherwise.
Use the PipeOpTaskPreproc
$affect_columns
functionality to only encode a subset of columns, or only encode columns of a certain type.
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
PipeOpEncode$new(id = "encode", param_vals = list())
id
:: character(1)
Identifier of resulting object, default "encode"
.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected factor
, character
or ordered
parameters encoded according to the method
parameter.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
constrasts
:: named list
of matrix
List of contrast matrices, one for each affected discrete feature. The rows of each matrix correspond to (training task) levels, the the
columns to the new columns that replace the old discrete feature. See stats::contrasts
.
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
method
:: character(1)
Initialized to "one-hot"
. One of:
"one-hot"
: create a new column for each factor level.
"treatment"
: create stats::contr.treatment()
).
"helmert"
: create columns according to Helmert contrasts (see stats::contr.helmert()
).
"poly"
: create columns with contrasts based on orthogonal polynomials (see stats::contr.poly()
).
"sum"
: create columns with contrasts summing to zero, (see stats::contr.sum()
).
Uses the stats::contrasts
functions. This is relatively inefficient for features with a large number of levels.
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
https://mlr3book.mlr-org.com/list-pipeops.html
Other PipeOps:
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreprocSimple
,
PipeOpTaskPreproc
,
PipeOp
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_scale
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
,
mlr_pipeops
# NOT RUN {
library("mlr3")
data = data.table::data.table(x = factor(letters[1:3]), y = factor(letters[1:3]))
task = TaskClassif$new("task", data, "x")
poe = po("encode")
# poe is initialized with encoding: "one-hot"
poe$train(list(task))[[1]]$data()
# other kinds of encoding:
poe$param_set$values$method = "treatment"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "helmert"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "poly"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "sum"
poe$train(list(task))[[1]]$data()
# }
Run the code above in your browser using DataLab