Learn R Programming

mlr3filters

Package website: release | dev

{mlr3filters} adds feature selection filters to mlr3. The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with mlr3pipelines and the filter operator.

Wrapper methods for feature selection are implemented in mlr3fselect. Learners which support the extraction feature importance scores can be combined with a filter from this package for embedded feature selection.

Installation

CRAN version

install.packages("mlr3filters")

Development version

remotes::install_github("mlr-org/mlr3filters")

Filters

Filter Example

set.seed(1)
library("mlr3")
library("mlr3filters")

task = tsk("sonar")
filter = flt("auc")
head(as.data.table(filter$calculate(task)))
##    feature     score
## 1:     V11 0.2811368
## 2:     V12 0.2429182
## 3:     V10 0.2327018
## 4:     V49 0.2312622
## 5:      V9 0.2308442
## 6:     V48 0.2062784

Implemented Filters

NamelabelTask TypesFeature TypesPackage
anovaANOVA F-TestClassifInteger, Numericstats
aucArea Under the ROC Curve ScoreClassifInteger, Numericmlr3measures
carscoreCorrelation-Adjusted coRrelation ScoreRegrLogical, Integer, Numericcare
carsurvscoreCorrelation-Adjusted coRrelation Survival ScoreSurvInteger, NumericcarSurv, mlr3proba
cmimMinimal Conditional Mutual Information MaximizationClassif & RegrInteger, Numeric, Factor, Orderedpraznik
correlationCorrelationRegrInteger, Numericstats
disrDouble Input Symmetrical RelevanceClassif & RegrInteger, Numeric, Factor, Orderedpraznik
find_correlationCorrelation-based ScoreUniversalInteger, Numericstats
importanceImportance ScoreUniversalLogical, Integer, Numeric, Character, Factor, Ordered, POSIXct
information_gainInformation GainClassif & RegrInteger, Numeric, Factor, OrderedFSelectorRcpp
jmiJoint Mutual InformationClassif & RegrInteger, Numeric, Factor, Orderedpraznik
jmimMinimal Joint Mutual Information MaximizationClassif & RegrInteger, Numeric, Factor, Orderedpraznik
kruskal_testKruskal-Wallis TestClassifInteger, Numericstats
mimMutual Information MaximizationClassif & RegrInteger, Numeric, Factor, Orderedpraznik
mrmrMinimum Redundancy Maximal RelevancyClassif & RegrInteger, Numeric, Factor, Orderedpraznik
njmimMinimal Normalised Joint Mutual Information MaximizationClassif & RegrInteger, Numeric, Factor, Orderedpraznik
performancePredictive PerformanceUniversalLogical, Integer, Numeric, Character, Factor, Ordered, POSIXct
permutationPermutation ScoreUniversalLogical, Integer, Numeric, Character, Factor, Ordered, POSIXct
reliefRELIEFClassif & RegrInteger, Numeric, Factor, OrderedFSelectorRcpp
selected_featuresEmbedded Feature SelectionUniversalLogical, Integer, Numeric, Character, Factor, Ordered, POSIXct
univariate_coxUnivariate Cox Survival ScoreSurvInteger, Numeric, Logicalsurvival
varianceVarianceUniversalInteger, Numericstats

Variable Importance Filters

The following learners allow the extraction of variable importance and therefore are supported by FilterImportance:

## [1] "classif.featureless" "classif.ranger"      "classif.rpart"      
## [4] "classif.xgboost"     "regr.featureless"    "regr.ranger"        
## [7] "regr.rpart"          "regr.xgboost"

If your learner is not listed here but capable of extracting variable importance from the fitted model, the reason is most likely that it is not yet integrated in the package mlr3learners or the extra learner extension. Please open an issue so we can add your package.

Some learners need to have their variable importance measure “activated” during learner creation. For example, to use the “impurity” measure of Random Forest via the {ranger} package:

task = tsk("iris")
lrn = lrn("classif.ranger", seed = 42)
lrn$param_set$values = list(importance = "impurity")

filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)
##         feature     score
## 1: Petal.Length 44.682462
## 2:  Petal.Width 43.113031
## 3: Sepal.Length  9.039099

Performance Filter

FilterPerformance is a univariate filter method which calls resample() with every predictor variable in the dataset and ranks the final outcome using the supplied measure. Any learner can be passed to this filter with classif.rpart being the default. Of course, also regression learners can be passed if the task is of type “regr”.

Filter-based Feature Selection

In many cases filtering is only one step in the modeling pipeline. To select features based on filter values, one can use PipeOpFilter from mlr3pipelines.

library(mlr3pipelines)
task = tsk("spam")

# the `filter.frac` should be tuned
graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
  po("learner", lrn("classif.rpart"))

learner = as_learner(graph)
rr = resample(task, learner, rsmp("holdout"))

Copy Link

Version

Install

install.packages('mlr3filters')

Monthly Downloads

2,701

Version

0.8.0

License

LGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

April 10th, 2024

Functions in mlr3filters (0.8.0)

mlr_filters_carscore

Correlation-Adjusted Marignal Correlation Score Filter
Filter

Filter Base Class
mlr_filters_boruta

Burota Filter
mlr_filters_carsurvscore

Correlation-Adjusted Survival Score Filter
flt

Syntactic Sugar for Filter Construction
mlr_filters_auc

AUC Filter
mlr3filters-package

mlr3filters: Filter Based Feature Selection for 'mlr3'
mlr_filters_anova

ANOVA F-Test Filter
mlr_filters

Dictionary of Filters
mlr_filters_cmim

Minimal Conditional Mutual Information Maximization Filter
mlr_filters_correlation

Correlation Filter
mlr_filters_mim

Mutual Information Maximization Filter
mlr_filters_importance

Filter for Embedded Feature Selection via Variable Importance
mlr_filters_mrmr

Minimum Redundancy Maximal Relevancy Filter
mlr_filters_find_correlation

Correlation Filter
mlr_filters_jmim

Minimal Joint Mutual Information Maximization Filter
mlr_filters_jmi

Joint Mutual Information Filter
mlr_filters_kruskal_test

Kruskal-Wallis Test Filter
mlr_filters_information_gain

Information Gain Filter
mlr_filters_disr

Double Input Symmetrical Relevance Filter
mlr_filters_performance

Predictive Performance Filter
mlr_filters_selected_features

Filter for Embedded Feature Selection
reexports

Objects exported from other packages
mlr_filters_univariate_cox

Univariate Cox Survival Filter
mlr_filters_njmim

Minimal Normalised Joint Mutual Information Maximization Filter
mlr_filters_permutation

Permutation Score Filter
mlr_filters_variance

Variance Filter
mlr_filters_relief

RELIEF Filter