Selector: Selector Functions

Description

A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.

Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.

Usage

selector_all()
selector_none()
selector_type(types)
selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE)
selector_name(feature_names, assert_present = FALSE)
selector_invert(selector)
selector_intersect(selector_x, selector_y)
selector_union(selector_x, selector_y)
selector_setdiff(selector_x, selector_y)
selector_missing()
selector_cardinality_greater_than(min_cardinality)

Value

function: A Selector function that takes a Task and returns the feature names to be processed.

Arguments

types: (character)
Type of feature to select
pattern: (character(1))
grep pattern
ignore.case: (logical(1))
ignore case
perl: (logical(1))
perl regex
fixed: (logical(1))
fixed pattern instead of regex
feature_names: (character)
Select features by exact name match.
assert_present: (logical(1))
Throw an error if feature_names are not all present in the task being operated on.
selector: (Selector)
Selector to invert.
selector_x: (Selector)
First Selector to query.
selector_y: (Selector)
Second Selector to query.
min_cardinality: (integer)
Minimum number of levels required to be selected.

Functions

selector_all: selector_all selects all features.
selector_none: selector_none selects none of the features.
selector_type: selector_type selects features according to type. Legal types are listed in mlr_reflections$task_feature_types.
selector_grep: selector_grep selects features with names matching the grep() pattern.
selector_name: selector_name selects features with names matching exactly the names listed.
selector_invert: selector_invert inverts a given Selector: It always selects the features that would be dropped by the other Selector, and drops the features that would be kept.
selector_intersect: selector_intersect selects the intersection of two Selectors: Only features selected by both Selectors are selected in the end.
selector_union: selector_union selects the union of two Selectors: Features selected by either Selector are selected in the end.
selector_setdiff: selector_setdiff selects the setdiff of two Selectors: Features selected by selector_x are selected, unless they are also selected by selector_y.
selector_missing: selector_missing selects features with missing values.
selector_cardinality_greater_than: selector_cardinality_greater_than selects categorical features with cardinality greater then a given threshold.

Details

A Selector is a function that has one input argument (commonly named task). The function is called with the Task that a PipeOp is operating on. The return value of the function must be a character vector that is a subset of the feature names present in the Task.

For example, a Selector that selects all columns is

function(task) {
  task$feature_names
}

(this is the selector_all()-Selector.) A Selector that selects all columns that have names shorter than four letters would be:

function(task) {
  task$feature_names[
    nchar(task$feature_names) < 4
  ]
}

A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is

function(task) {
  intersect(task$feature_names, "Sepal.Length")
}

It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.

Examples

Run this code

library("mlr3")

iris_task = tsk("iris")
bh_task = tsk("boston_housing")

sela = selector_all()
sela(iris_task)
sela(bh_task)

self = selector_type("factor")
self(iris_task)
self(bh_task)

selg = selector_grep("a.*i")
selg(iris_task)
selg(bh_task)

selgi = selector_invert(selg)
selgi(iris_task)
selgi(bh_task)

selgf = selector_union(selg, self)
selgf(iris_task)
selgf(bh_task)

Run the code above in your browser using DataLab