Creates a Graph
that can be used to robustify any subsequent learner.
Performs the following steps:
Drops empty factor levels using PipeOpFixFactors
Imputes numeric
features using PipeOpImputeHist
and PipeOpMissInd
Imputes factor
features using PipeOpImputeOOR
Encodes factors
using one-hot-encoding
. Factors with a cardinality > max_cardinality are
collapsed using PipeOpCollapseFactors
The graph is built conservatively, i.e. the function always tries to assure everything works. If a learner is provided, some steps can be left out, i.e. if the learner can deal with factor variables, no encoding is performed.
All input arguments are cloned and have no references in common with the returned Graph
.
pipeline_robustify(
task = NULL,
learner = NULL,
impute_missings = NULL,
factors_to_numeric = NULL,
max_cardinality = 1000,
ordered_action = "factor",
character_action = "factor",
POSIXct_action = "numeric"
)
Graph
Task
A Task
to create a robustifying pipeline for.
Optional, if omitted, the "worst possible" Task
is assumed and the full pipeline is created.
Learner
A learner to create a robustifying pipeline for. Optional, if omitted,
the "worst possible" Learner
is assumed and a more conservative pipeline is built.
logical(1)
| NULL
Should missing values be imputed? Defaults to NULL
: imputes if the task has
missing values (or factors that are not encoded to numerics) and the learner can not handle them.
logical(1)
| NULL
Should (ordered and unordered) factors be encoded? Defaults to NULL
: encodes if the task has factors (or character columns that get converted to factor)
and the learner can not handle factors.
integer(1)
Maximum number of factor levels allowed. See above. Default: 1000.
character(1)
How to handle ordered
columns: "factor"
(default) or "factor!"
: convert to factor
columns;
"numeric"
or "numeric!"
: convert to numeric
columns; "integer"
or "integer!"
: convert to integer
columns; "ignore"
or "ignore!"
: ignore.
When task
is given and has no ordered
columns, or when learner
is given and can handle ordered
, then
"factor"
, "numeric"
and "integer"
are treated like "ignore"
. This means it is necessary to add the exclamation point to
override Task
or Learner
properties when given. "ignore"
and "ignore!"
therefore
behave completely identically, "ignore!"
is only present for consistency.
When ordered
features are converted to factor
, then they are treated like factor
features further down in the pipeline,
and are possibly eventually converted to numeric
s, but in a different way: factor
s get one-hot encoded, ordered_action
= "numeric"
converts ordered using as.numeric
to their integer-valued rank.
character(1)
How to handle character
columns: "factor"
(default) or "factor!"
: convert to factor
columns;
"matrix"
or "matrix!"
: Use PipeOpTextVectorizer
. "ignore"
or "ignore!"
: ignore.
When task
is given and has no character
columns, or when learner
is given and can handle character
, then
"factor"
and "matrix"
are treated like "ignore"
. This means it is necessary to add the exclamation point to
override Task
or Learner
properties when given. "ignore"
and "ignore!"
therefore
behave completely identically, "ignore!"
is only present for consistency.
When character
columns are converted to factor
, then they are treated like factor
further down in the pipeline,
and are possibly eventually converted to numeric
s, using one-hot encoding.
character(1)
How to handle POSIXct
columns: "numeric"
(default) or "numeric!"
: convert to numeric
columns;
"datefeatures"
or "datefeatures!"
: Use PipeOpDateFeatures
. "ignore"
or "ignore!"
: ignore.
When task
is given and has no POSIXct
columns, or when learner
is given and can handle POSIXct
, then
"numeric"
and "datefeatures"
are treated like "ignore"
. This means it is necessary to add the exclamation point to
override Task
or Learner
properties when given. "ignore"
and "ignore!"
therefore
behave completely identically, "ignore!"
is only present for consistency.
if (requireNamespace("rpart")) {
# \donttest{
library(mlr3)
lrn = lrn("regr.rpart")
task = mlr_tasks$get("boston_housing")
gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn)
resample(task, GraphLearner$new(gr), rsmp("holdout"))
# }
}
Run the code above in your browser using DataLab