embedded_ensemble_fselect: Embedded Ensemble Feature Selection

Description

Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.

Usage

embedded_ensemble_fselect(
  task,
  learners,
  init_resampling,
  measure,
  store_benchmark_result = TRUE
)

Value

an EnsembleFSResult object.

Arguments

task: (mlr3::Task)
Task to operate on.
learners: (list of mlr3::Learner)
The learners to be used for feature selection. All learners must have the selected_features property, i.e. implement embedded feature selection (e.g. regularized models).
init_resampling: (mlr3::Resampling)
The initial resampling strategy of the data, from which each train set will be passed on to the learners and each test set will be used for prediction. Can only be mlr3::ResamplingSubsampling or mlr3::ResamplingBootstrap.
measure: (mlr3::Measure)
The measure used to score each learner on the test sets generated by init_resampling. If NULL, default measure is used.
store_benchmark_result: (logical(1))
Whether to store the benchmark result in EnsembleFSResult or not.

Details

The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.

For each subsample (train set) generated in the previous step, the method applies learners that support embedded feature selection. These learners are then scored on their ability to predict on the resampled test sets, storing the selected features during training, for each combination of subsample and learner.

Results are stored in an EnsembleFSResult.

Examples

Run this code

# \donttest{
  eefsr = embedded_ensemble_fselect(
    task = tsk("sonar"),
    learners = lrns(c("classif.rpart", "classif.featureless")),
    init_resampling = rsmp("subsampling", repeats = 5),
    measure = msr("classif.ce")
  )
  eefsr
# }

Run the code above in your browser using DataLab