ensemble_fselect: Wrapper-based Ensemble Feature Selection

Description

Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.

Usage

ensemble_fselect(
  fselector,
  task,
  learners,
  init_resampling,
  inner_resampling,
  inner_measure,
  measure,
  terminator,
  callbacks = NULL,
  store_benchmark_result = TRUE,
  store_models = FALSE
)

Value

an EnsembleFSResult object.

Arguments

fselector: (FSelector)
Optimization algorithm.
task: (mlr3::Task)
Task to operate on.
learners: (list of mlr3::Learner)
The learners to be used for feature selection.
init_resampling: (mlr3::Resampling)
The initial resampling strategy of the data, from which each train set will be passed on to the auto_fselector to optimize the learners and perform feature selection. Each test set will be used for prediction on the final models returned by auto_fselector. Can only be mlr3::ResamplingSubsampling or mlr3::ResamplingBootstrap.
inner_resampling: (mlr3::Resampling)
The inner resampling strategy used by the FSelector.
inner_measure: (mlr3::Measure)
The inner optimization measure used by the FSelector.
measure: (mlr3::Measure)
Measure used to score each trained learner on the test sets generated by init_resampling.
terminator: (bbotk::Terminator)
Stop criterion of the feature selection.
callbacks: (Named list of lists of CallbackBatchFSelect)
Callbacks to be used for each learner. The lists must be named by the learner ids.
store_benchmark_result: (logical(1))
Whether to store the benchmark result in EnsembleFSResult or not.
store_models: (logical(1))
Whether to store models in auto_fselector or not.

Details

The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.

For each subsample (train set) generated in the previous step, the method performs wrapped-based feature selection (auto_fselector) using each provided learner, the given inner resampling method, inner performance measure and optimization algorithm. This process generates 1) the best feature subset and 2) a final trained model using these best features, for each combination of subsample and learner. The final models are then scored on their ability to predict on the resampled test sets.

Results are stored in an EnsembleFSResult.

The result object also includes the performance scores calculated during the inner resampling of the training sets, using models with the best feature subsets. These scores are stored in a column named {measure_id}_inner.

Examples

Run this code

# \donttest{
  efsr = ensemble_fselect(
    fselector = fs("random_search"),
    task = tsk("sonar"),
    learners = lrns(c("classif.rpart", "classif.featureless")),
    init_resampling = rsmp("subsampling", repeats = 2),
    inner_resampling = rsmp("cv", folds = 3),
    inner_measure = msr("classif.ce"),
    measure = msr("classif.acc"),
    terminator = trm("evals", n_evals = 10)
  )
  efsr
# }

Run the code above in your browser using DataLab