mlr3pipelines (version 0.5.1)

Selector: Selector Functions

Description

A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.

Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.

Usage

selector_all()

selector_none()

selector_type(types)

selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE)

selector_name(feature_names, assert_present = FALSE)

selector_invert(selector)

selector_intersect(selector_x, selector_y)

selector_union(selector_x, selector_y)

selector_setdiff(selector_x, selector_y)

selector_missing()

selector_cardinality_greater_than(min_cardinality)

Value

function: A Selector function that takes a Task and returns the feature names to be processed.

Arguments

types

(character)
Type of feature to select

pattern

(character(1))
grep pattern

ignore.case

(logical(1))
ignore case

perl

(logical(1))
perl regex

fixed

(logical(1))
fixed pattern instead of regex

feature_names

(character)
Select features by exact name match.

assert_present

(logical(1))
Throw an error if feature_names are not all present in the task being operated on.

selector

(Selector)
Selector to invert.

selector_x

(Selector)
First Selector to query.

selector_y

(Selector)
Second Selector to query.

min_cardinality

(integer)
Minimum number of levels required to be selected.

Functions

  • selector_all(): selector_all selects all features.

  • selector_none(): selector_none selects none of the features.

  • selector_type(): selector_type selects features according to type. Legal types are listed in mlr_reflections$task_feature_types.

  • selector_grep(): selector_grep selects features with names matching the grep() pattern.

  • selector_name(): selector_name selects features with names matching exactly the names listed.

  • selector_invert(): selector_invert inverts a given Selector: It always selects the features that would be dropped by the other Selector, and drops the features that would be kept.

  • selector_intersect(): selector_intersect selects the intersection of two Selectors: Only features selected by both Selectors are selected in the end.

  • selector_union(): selector_union selects the union of two Selectors: Features selected by either Selector are selected in the end.

  • selector_setdiff(): selector_setdiff selects the setdiff of two Selectors: Features selected by selector_x are selected, unless they are also selected by selector_y.

  • selector_missing(): selector_missing selects features with missing values.

  • selector_cardinality_greater_than(): selector_cardinality_greater_than selects categorical features with cardinality greater then a given threshold.

Details

A Selector is a function that has one input argument (commonly named task). The function is called with the Task that a PipeOp is operating on. The return value of the function must be a character vector that is a subset of the feature names present in the Task.

For example, a Selector that selects all columns is

function(task) {
  task$feature_names
}

(this is the selector_all()-Selector.) A Selector that selects all columns that have names shorter than four letters would be:

function(task) {
  task$feature_names[
    nchar(task$feature_names) < 4
  ]
}

A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is

function(task) {
  intersect(task$feature_names, "Sepal.Length")
}

It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.

See Also

Other Selectors: mlr_pipeops_select

Examples

Run this code
library("mlr3")

iris_task = tsk("iris")
bh_task = tsk("boston_housing")

sela = selector_all()
sela(iris_task)
sela(bh_task)

self = selector_type("factor")
self(iris_task)
self(bh_task)

selg = selector_grep("a.*i")
selg(iris_task)
selg(bh_task)

selgi = selector_invert(selg)
selgi(iris_task)
selgi(bh_task)

selgf = selector_union(selg, self)
selgf(iris_task)
selgf(bh_task)

Run the code above in your browser using DataLab