Implementation of Permutation Feature Importance (PFI) using modular sampling approach. PFI measures the importance of a feature by calculating the increase in model error when the feature's values are randomly permuted, breaking the relationship between the feature and the target variable.
xplainfi::FeatureImportanceMethod -> xplainfi::PerturbationImportance -> PFI
new()Creates a new instance of the PFI class
PFI$new(
task,
learner,
measure = NULL,
resampling = NULL,
features = NULL,
groups = NULL,
relation = "difference",
n_repeats = 1L,
batch_size = NULL
)task, learner, measure, resampling, features, groups, relation, n_repeats, batch_sizePassed to PerturbationImportance
compute()Compute PFI scores
PFI$compute(
n_repeats = NULL,
batch_size = NULL,
store_models = TRUE,
store_backends = TRUE
)n_repeats(integer(1); NULL) Number of permutation iterations. If NULL, uses stored value.
batch_size(integer(1) | NULL: NULL) Maximum number of rows to predict at once. If NULL, uses stored value.
store_models, store_backends(logical(1): TRUE) Whether to store fitted models / data backends, passed to mlr3::resample internally
for the initial fit of the learner.
This may be required for certain measures and is recommended to leave enabled unless really necessary.
clone()The objects of this class are cloneable with this method.
PFI$clone(deep = FALSE)deepWhether to make a deep clone.
Permutation Feature Importance was originally introduced by Breiman (2001) as part of the Random Forest algorithm. The method works by:
Computing baseline model performance on the original dataset
For each feature, randomly permuting its values while keeping other features unchanged
Computing model performance on the permuted dataset
Calculating importance as the difference (or ratio) between permuted and original performance
Breiman L (2001). “Random Forests.” Machine Learning, 45(1), 5--32. tools:::Rd_expr_doi("10.1023/A:1010933404324"). Fisher A, Rudin C, Dominici F (2019). “All Models Are Wrong, but Many Are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.” Journal of Machine Learning Research, 20, 177. https://pmc.ncbi.nlm.nih.gov/articles/PMC8323609/. Strobl C, Boulesteix A, Kneib T, Augustin T, Zeileis A (2008). “Conditional Variable Importance for Random Forests.” BMC Bioinformatics, 9(1), 307. tools:::Rd_expr_doi("10.1186/1471-2105-9-307").