A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting
from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.
Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors
shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect
selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.
selector_all()selector_none()
selector_type(types)
selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE)
selector_name(feature_names, assert_present = FALSE)
selector_invert(selector)
selector_intersect(selector_x, selector_y)
selector_union(selector_x, selector_y)
selector_setdiff(selector_x, selector_y)
selector_missing()
selector_cardinality_greater_than(min_cardinality)
(character)
Type of feature to select
(character(1))
grep pattern
(logical(1))
ignore case
(logical(1))
perl regex
(logical(1))
fixed pattern instead of regex
(character)
Select features by exact name match.
(logical(1))
Throw an error if feature_names are not all present in the task being operated on.
(integer)
Minimum number of levels required to be selected.
function: A Selector function that takes a Task and returns the feature names to be processed.
selector_all: selector_all selects all features.
selector_none: selector_none selects none of the features.
selector_type: selector_type selects features according to type. Legal types are listed in mlr_reflections$task_feature_types.
selector_grep: selector_grep selects features with names matching the grep() pattern.
selector_name: selector_name selects features with names matching exactly the names listed.
selector_invert: selector_invert inverts a given Selector: It always selects the features
that would be dropped by the other Selector, and drops the features that
would be kept.
selector_intersect: selector_intersect selects the intersection of two Selectors: Only features
selected by both Selectors are selected in the end.
selector_union: selector_union selects the union of two Selectors: Features
selected by either Selector are selected in the end.
selector_setdiff: selector_setdiff selects the setdiff of two Selectors: Features
selected by selector_x are selected, unless they are also selected
by selector_y.
selector_missing: selector_missing selects features with missing values.
selector_cardinality_greater_than: selector_cardinality_greater_than selects categorical features with cardinality
greater then a given threshold.
A Selector is a function
that has one input argument (commonly named task). The function is called with the Task that a PipeOp
is operating on. The return value of the function must be a character vector that is a subset of the feature names present
in the Task.
For example, a Selector that selects all columns is
function(task) {
task$feature_names
}
(this is the selector_all()-Selector.) A Selector that selects
all columns that have names shorter than four letters would be:
function(task) {
task$feature_names[
nchar(task$feature_names) < 4
]
}
A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is
function(task) {
intersect(task$feature_names, "Sepal.Length")
}
It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.
Other Selectors:
mlr_pipeops_select
# NOT RUN {
library("mlr3")
iris_task = tsk("iris")
bh_task = tsk("boston_housing")
sela = selector_all()
sela(iris_task)
sela(bh_task)
self = selector_type("factor")
self(iris_task)
self(bh_task)
selg = selector_grep("a.*i")
selg(iris_task)
selg(bh_task)
selgi = selector_invert(selg)
selgi(iris_task)
selgi(bh_task)
selgf = selector_union(selg, self)
selgf(iris_task)
selgf(bh_task)
# }
Run the code above in your browser using DataLab