Learn R Programming

xplainfi

The goal of xplainfi is to collect common feature importance methods under a unified and extensible interface.

It is built around mlr3 as available abstractions for learners, tasks, measures, etc. greatly simplify the implementation of importance measures.

Installation

Install xplainfi from CRAN:

install.packages("xplainfi")

Or install from R-universe:

install.packages("xplainfi", repos = c("https://mlr-org.r-universe.dev", "https://cloud.r-project.org"))

The latest development version of xplainfi can be installed with pak:

# install.packages(pak)
pak::pak("mlr-org/xplainfi")

Example: PFI

Here is a basic example on how to calculate PFI for an untrained learner and task, using cross-validation for resampling and computing PFI within each resampling iteration 10 times on the friedman1 task (see ?mlbench::mlbench.friedman1).

The friedman1 task has the following structure:

$$y = 10 \sin(\pi x_1 x_2) + 20(x_3 - 0.5)^2 + 10x_4 + 5x_5 + \varepsilon$$

Where $x_{{1,2,3,4,5}}$ are named important1 through important5 in the Task, with additional numbered unimportant features without effect on $y$.

library(xplainfi)
library(mlr3learners)
#> Loading required package: mlr3

task = tgen("friedman1")$generate(1000)
learner = lrn("regr.ranger", num.trees = 100)
measure = msr("regr.mse")

pfi = PFI$new(
    task = task,
    learner = learner,
    measure = measure,
    resampling = rsmp("cv", folds = 3),
    n_repeats = 30
)

Compute and print PFI scores:

pfi$compute()
pfi$importance()
#> Key: <feature>
#>          feature   importance
#>           <char>        <num>
#>  1:   important1  8.183995584
#>  2:   important2  7.481268675
#>  3:   important3  1.571760349
#>  4:   important4 12.585739572
#>  5:   important5  2.810875567
#>  6: unimportant1  0.030667439
#>  7: unimportant2 -0.002837696
#>  8: unimportant3 -0.044922079
#>  9: unimportant4 -0.060054450
#> 10: unimportant5  0.060148388

If it aids interpretation, importances can also be calculated as the ratio rather than the difference between the baseline and post-permutation losses:

pfi$importance(relation = "ratio")
#> Key: <feature>
#>          feature importance
#>           <char>      <num>
#>  1:   important1  2.6987668
#>  2:   important2  2.5598945
#>  3:   important3  1.3294180
#>  4:   important4  3.6278508
#>  5:   important5  1.5874860
#>  6: unimportant1  1.0067957
#>  7: unimportant2  0.9994507
#>  8: unimportant3  0.9905990
#>  9: unimportant4  0.9874657
#> 10: unimportant5  1.0126572

When PFI is computed based on resampling with multiple iterations, and / or multiple permutation iterations, the individual scores can be retrieved as a data.table:

str(pfi$scores())
#> Classes 'data.table' and 'data.frame':   900 obs. of  6 variables:
#>  $ feature          : chr  "important1" "important1" "important1" "important1" ...
#>  $ iter_rsmp        : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ iter_repeat      : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ regr.mse_baseline: num  4.56 4.56 4.56 4.56 4.56 ...
#>  $ regr.mse_post    : num  12.3 11.9 11.3 12.1 13.6 ...
#>  $ importance       : num  7.77 7.33 6.74 7.56 9.06 ...
#>  - attr(*, ".internal.selfref")=<externalptr>

Where iter_rsmp corresponds to the resampling iteration, i.e., 3 for 3-fold cross-validation, and iter_repeat corresponds to the permutation iteration within each resampling iteration, 5 in this case. While pfi$importance() contains the means across all iterations, pfi$scores() allows you to manually visualize or aggregate them in any way you see fit.

For example:

library(ggplot2)

ggplot(
    pfi$scores(),
    aes(x = importance, y = reorder(feature, importance))
) +
    geom_boxplot(color = "#f44560", fill = alpha("#f44560", 0.4)) +
    labs(
        title = "Permutation Feature Importance on Friedman1",
        subtitle = "Computed over 3-fold CV with 5 permutations per iteration using Random Forest",
        x = "Importance",
        y = "Feature"
    ) +
    theme_minimal(base_size = 16) +
    theme(
        plot.title.position = "plot",
        panel.grid.major.y = element_blank()
    )

If the measure in question needs to be maximized rather than minimized (like $R^2$), the internal importance calculation takes that into account via the $minimize property of the measure and calculates importances such that the intuition “performance improvement” -> “higher importance score” still holds:

pfi = PFI$new(
    task = task,
    learner = learner,
    measure = msr("regr.rsq")
)
#> ℹ No <Resampling> provided, using `resampling = rsmp("holdout", ratio = 2/3)`
#>   (test set size: 333)

pfi$compute()
pfi$importance()
#> Key: <feature>
#>          feature   importance
#>           <char>        <num>
#>  1:   important1  0.329915393
#>  2:   important2  0.297695022
#>  3:   important3  0.063613087
#>  4:   important4  0.493673768
#>  5:   important5  0.121794662
#>  6: unimportant1  0.003972813
#>  7: unimportant2  0.002157623
#>  8: unimportant3 -0.002780577
#>  9: unimportant4  0.001914150
#> 10: unimportant5  0.001366645

See vignette("xplainfi") for more examples.

Copy Link

Version

Install

install.packages('xplainfi')

Version

1.1.0

License

LGPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Lukas Burk

Last Published

February 26th, 2026

Functions in xplainfi (1.1.0)

PerturbationImportance

Perturbation Feature Importance Base Class
MarginalSAGE

Marginal SAGE
KnockoffSampler

Knockoff Sampler
MarginalReferenceSampler

Marginal Reference Sampler
sage_aggregate_predictions

Aggregate Predictions by Coalition and Test Instance
check_groups

Check group specification
op-null-default

Default value for NULL
rsmp_all_test

Create a resampling with all data being test data
sim_dgp_ewald

Simulate data as in Ewald et al. (2024)
wvim_design_matrix

Create Feature Selection Design Matrix
sim_dgp_scenarios

Simulation DGPs for Feature Importance Method Comparison
LOCO

Leave-One-Covariate-Out (LOCO)
MarginalSampler

Marginal Sampler Base Class
SAGE

Shapley Additive Global Importance (SAGE) Base Class
MarginalPermutationSampler

Marginal Permutation Sampler
PFI

Permutation Feature Importance
RFI

Relative Feature Importance
WVIM

Williamson's Variable Importance Measure (WVIM)
sage_batch_predict

Batch Predict for SAGE
print_bib

Print an Rd-formatted bib entry
xplain_opt

xplainfi Package Options
xplainfi-package

xplainfi: Feature Importance Methods for Global Explanations
ConditionalARFSampler

ARF-based Conditional Sampler
CFI

Conditional Feature Importance
ConditionalSampler

Conditional Feature Sampler
KnockoffGaussianSampler

Gaussian Knockoff Conditional Sampler
FeatureSampler

Feature Sampler Class
ConditionalGaussianSampler

Gaussian Conditional Sampler
FeatureImportanceMethod

Feature Importance Method Class
ConditionalSAGE

Conditional SAGE
ConditionalKNNSampler

k-Nearest Neighbors Conditional Sampler
ConditionalCtreeSampler

(experimental) Conditional Inference Tree Conditional Sampler