Filter: Filter Base Class

Description

Base class for filters. Predefined filters are stored in the dictionary mlr_filters. A Filter calculates a score for each feature of a task. Important features get a large value and unimportant features get a small value. Note that filter scores may also be negative.

Arguments

Format

R6::R6Class object.

Construction

f = Filter$new(id, task_type, param_set, param_vals, feature_types, packages)

id :: character(1) Identifier for the filter.
task_type :: character() Types of the task the filter can operator on. E.g., "classif" or "regr".
param_set :: paradox::ParamSet Set of hyperparameters.
param_vals :: named list() Named list of hyperparameter settings.
feature_types :: character() Feature types the filter operates on. Must be a subset of mlr_reflections$task_feature_types.
task_properties :: character() Required task properties, see mlr3::Task. Must be a subset of mlr_reflections$task_properties.
packages :: character() Set of required packages. Note that these packages will be loaded via requireNamespace(), and are not attached.

Fields

All arguments passed to the constructor are available as fields, and additionally:

scores :: named numeric() Stores the calculated filter score values as named numeric vector. The vector is sorted in decreasing order with possible NA values last. Tied values (this includes NA values) appear in a random, non-deterministic order.

Methods

calculate(task, nfeat = NULL) (mlr3::Task, integer(1)) -> self Calculates the filter score values for the provided mlr3::Task and stores them in field scores. nfeat determines the minimum number of features to score (see "Partial Scoring"), and defaults to the number of features in task. Loads required packages and then calls $calculate_internal(). If the task has no rows, each feature gets the score NA.
calculate_internal(task, nfeat) (mlr3::Task, integer(1)) -> named numeric() Internal worker function. Each child class muss implement this method. Takes a task and the minimum number of features to score, and must return a named numeric with scores. The higher the score, the more important the feature. The calling function (calculate()) ensures that the returned vector gets sorted and that missing feature scores get a score value of NA.

Partial Scoring

Some features support partial scoring of the feature set: If nfeat is not NULL, only the best nfeat features are guaranteed to get a score. Additional features may be ignored for computational reasons, and then get a score value of NA.