Implementation of CFI using modular sampling approach
xplainfi::FeatureImportanceMethod -> xplainfi::PerturbationImportance -> CFI
new()Creates a new instance of the CFI class
CFI$new(
task,
learner,
measure = NULL,
resampling = NULL,
features = NULL,
groups = NULL,
relation = "difference",
n_repeats = 30L,
batch_size = NULL,
sampler = NULL
)task, learner, measure, resampling, features, groups, relation, n_repeats, batch_sizePassed to PerturbationImportance.
sampler(ConditionalSampler) Optional custom sampler. Defaults to instantiating ConditionalARFSampler internally with default parameters.
compute()Compute CFI scores
CFI$compute(
n_repeats = NULL,
batch_size = NULL,
store_models = TRUE,
store_backends = TRUE
)n_repeats(integer(1)) Number of permutation iterations. If NULL, uses stored value.
batch_size(integer(1) | NULL: NULL) Maximum number of rows to predict at once. If NULL, uses stored value.
store_models, store_backends(logical(1): TRUE) Whether to store fitted models / data backends, passed to mlr3::resample internally
for the initial fit of the learner.
This may be required for certain measures and is recommended to leave enabled unless really necessary.
clone()The objects of this class are cloneable with this method.
CFI$clone(deep = FALSE)deepWhether to make a deep clone.
CFI replaces feature values with conditional samples from the distribution of the feature given the other features. Any ConditionalSampler or KnockoffSampler can be used.
Two approaches for statistical inference are primarily supported via
$importance(ci_method = "cpi"):
CPI (Watson & Wright, 2021): The original Conditional Predictive Impact method, designed for use with knockoff samplers (KnockoffGaussianSampler).
cARFi (Blesch et al., 2025): CFI with ARF-based conditional sampling (ConditionalARFSampler), using the same CPI inference framework.
Both require a decomposable measure (e.g., MSE) and out-of-sample evaluation.
CPI inference is guaranteed to be valid with holdout (a single train/test split).
With cross-validation, test observations are i.i.d. but models are fit on
overlapping training data, which may affect inference coverage. With bootstrap
or subsampling, both non-i.i.d. test observations and overlapping training data
can be an issue. See vignette("inference", package = "xplainfi") for details.
Available tests: "t" (t-test), "wilcoxon" (signed-rank), "fisher" (permutation),
"binomial" (sign test). The Fisher test is recommended.
Method-agnostic inference methods ("raw", "nadeau_bengio", "quantile") are also
available; see FeatureImportanceMethod for details.
For a comprehensive overview of inference methods including usage examples,
see vignette("inference", package = "xplainfi").
Watson D, Wright M (2021). “Testing Conditional Independence in Supervised Learning Algorithms.” Machine Learning, 110(8), 2107--2129. tools:::Rd_expr_doi("10.1007/s10994-021-06030-6").
Blesch K, Koenen N, Kapar J, Golchian P, Burk L, Loecher M, Wright M (2025). “Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests.” Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 15596--15604. tools:::Rd_expr_doi("10.1609/aaai.v39i15.33712").
library(mlr3)
task <- sim_dgp_correlated(n = 200)
# Using default ConditionalARFSampler
cfi <- CFI$new(
task = task,
learner = lrn("regr.rpart"),
measure = msr("regr.mse"),
sampler = ConditionalGaussianSampler$new(task),
n_repeats = 5
)
cfi$compute()
cfi$importance()
Run the code above in your browser using DataLab