Learn R Programming

mlr3fselect

Package website: release | dev

mlr3fselect is the feature selection package of the mlr3 ecosystem. It selects the optimal feature set for any mlr3 learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling. The package is built on the optimization framework bbotk.

Resources

There are several section about feature selection in the mlr3book.

The gallery features a collection of case studies and demos about optimization.

The cheatsheet summarizes the most important functions of mlr3fselect.

Installation

Install the last release from CRAN:

install.packages("mlr3fselect")

Install the development version from GitHub:

remotes::install_github("mlr-org/mlr3fselect")

Example

We run a feature selection for a support vector machine on the Spam data set.

library("mlr3verse")

tsk("spam")
## <TaskClassif:spam> (4601 x 58): HP Spam Detection
## * Target: type
## * Properties: twoclass
## * Features (57):
##   - dbl (57): address, addresses, all, business, capitalAve, capitalLong, capitalTotal,
##     charDollar, charExclamation, charHash, charRoundbracket, charSemicolon,
##     charSquarebracket, conference, credit, cs, data, direct, edu, email, font, free,
##     george, hp, hpl, internet, lab, labs, mail, make, meeting, money, num000, num1999,
##     num3d, num415, num650, num85, num857, order, original, our, over, parts, people, pm,
##     project, re, receive, remove, report, table, technology, telnet, will, you, your

We construct an instance with the fsi() function. The instance describes the optimization problem.

instance = fsi(
  task = tsk("spam"),
  learner = lrn("classif.svm", type = "C-classification"),
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce"),
  terminator = trm("evals", n_evals = 20)
)
instance
## <FSelectInstanceBatchSingleCrit>
## * State:  Not optimized
## * Objective: <ObjectiveFSelect:classif.svm_on_spam>
## * Terminator: <TerminatorEvals>

We select a simple random search as the optimization algorithm.

fselector = fs("random_search", batch_size = 5)
fselector
## <FSelectorBatchRandomSearch>: Random Search
## * Parameters: batch_size=5
## * Properties: single-crit, multi-crit
## * Packages: mlr3fselect

To start the feature selection, we simply pass the instance to the fselector.

fselector$optimize(instance)

The fselector writes the best hyperparameter configuration to the instance.

instance$result_feature_set
##  [1] "address"           "addresses"         "all"               "business"
##  [5] "capitalAve"        "capitalLong"       "capitalTotal"      "charDollar"
##  [9] "charExclamation"   "charHash"          "charRoundbracket"  "charSemicolon"
## [13] "charSquarebracket" "conference"        "credit"            "cs"
## [17] "data"              "direct"            "edu"               "email"
## [21] "font"              "free"              "george"            "hp"
## [25] "internet"          "lab"               "labs"              "mail"
## [29] "make"              "meeting"           "money"             "num000"
## [33] "num1999"           "num3d"             "num415"            "num650"
## [37] "num85"             "num857"            "order"             "our"
## [41] "parts"             "people"            "pm"                "project"
## [45] "re"                "receive"           "remove"            "report"
## [49] "table"             "technology"        "telnet"            "will"
## [53] "you"               "your"

And the corresponding measured performance.

instance$result_y
## classif.ce
## 0.07042005

The archive contains all evaluated hyperparameter configurations.

as.data.table(instance$archive)
##     address addresses   all business capitalAve capitalLong capitalTotal charDollar charExclamation
##  1:    TRUE      TRUE  TRUE     TRUE       TRUE        TRUE         TRUE       TRUE            TRUE
##  2:    TRUE      TRUE  TRUE    FALSE      FALSE        TRUE         TRUE       TRUE            TRUE
##  3:    TRUE      TRUE FALSE    FALSE       TRUE        TRUE         TRUE       TRUE            TRUE
##  4:    TRUE      TRUE  TRUE     TRUE       TRUE        TRUE         TRUE       TRUE            TRUE
##  5:   FALSE     FALSE FALSE    FALSE      FALSE       FALSE        FALSE       TRUE           FALSE
## ---
## 16:   FALSE     FALSE FALSE    FALSE      FALSE       FALSE        FALSE      FALSE           FALSE
## 17:   FALSE     FALSE FALSE     TRUE       TRUE        TRUE        FALSE      FALSE            TRUE
## 18:   FALSE     FALSE  TRUE     TRUE      FALSE       FALSE        FALSE       TRUE           FALSE
## 19:    TRUE      TRUE  TRUE     TRUE      FALSE        TRUE         TRUE       TRUE            TRUE
## 20:    TRUE     FALSE  TRUE    FALSE      FALSE        TRUE        FALSE       TRUE           FALSE
## 56 variables not shown: [charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, direct, edu, ...]

We fit a final model with the optimized feature set to make predictions on new data.

task = tsk("spam")
learner = lrn("classif.svm", type = "C-classification")

task$select(instance$result_feature_set)
learner$train(task)

Copy Link

Version

Install

install.packages('mlr3fselect')

Monthly Downloads

2,937

Version

1.3.0

License

LGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Marc Becker

Last Published

January 16th, 2025

Functions in mlr3fselect (1.3.0)

FSelectorBatch

Class for Batch Feature Selection Algorithms
ArchiveBatchFSelect

Class for Logging Evaluated Feature Sets
FSelectInstanceBatchSingleCrit

Class for Single Criterion Feature Selection
FSelectorBatchFromOptimizerBatch

FSelectorBatchFromOptimizerBatch
ContextBatchFSelect

Evaluation Context
AutoFSelector

Class for Automatic Feature Selection
ObjectiveFSelect

Class for Feature Selection Objective
extract_inner_fselect_archives

Extract Inner Feature Selection Archives
ObjectiveFSelectBatch

Class for Feature Selection Objective
fs

Syntactic Sugar for FSelect Construction
fselect

Function for Feature Selection
auto_fselector

Function for Automatic Feature Selection
extract_inner_fselect_results

Extract Inner Feature Selection Results
ensemble_fs_result

Ensemble Feature Selection Result
callback_batch_fselect

Create Feature Selection Callback
embedded_ensemble_fselect

Embedded Ensemble Feature Selection
mlr3fselect.svm_rfe

SVM-RFE Callback
mlr3fselect_assertions

Assertion for mlr3fselect objects
mlr_fselectors_genetic_search

Feature Selection with Genetic Search
mlr_fselectors_exhaustive_search

Feature Selection with Exhaustive Search
ensemble_fselect

Wrapper-based Ensemble Feature Selection
fselect_nested

Function for Nested Resampling
mlr3fselect-package

mlr3fselect: Feature Selection for 'mlr3'
mlr3fselect.backup

Backup Benchmark Result Callback
mlr_fselectors_rfecv

Feature Selection with Recursive Feature Elimination with Cross Validation
mlr_fselectors_sequential

Feature Selection with Sequential Search
mlr_fselectors

Dictionary of FSelectors
mlr_fselectors_design_points

Feature Selection with Design Points
mlr3fselect.internal_tuning

Internal Tuning Callback
mlr3fselect.one_se_rule

One Standard Error Rule Callback
fsi

Syntactic Sugar for Instance Construction
mlr_fselectors_shadow_variable_search

Feature Selection with Shadow Variable Search
mlr_fselectors_random_search

Feature Selection with Random Search
mlr_fselectors_rfe

Feature Selection with Recursive Feature Elimination
reexports

Objects exported from other packages
FSelectInstanceBatchMultiCrit

Class for Multi Criteria Feature Selection
FSelector

FSelector
CallbackBatchFSelect

Create Feature Selection Callback