Learn R Programming

fastml (version 0.7.7)

explain_stability: Analyze Feature Importance Stability Across Cross-Validation Folds

Description

Computes feature importance for each fold model and aggregates results to assess the stability of feature importance rankings across resamples. This helps identify features that are consistently important vs those whose importance varies across different data subsets.

Usage

explain_stability(
  object,
  model_name = NULL,
  vi_iterations = 10,
  seed = 123,
  plot = TRUE,
  conf_level = 0.95
)

Value

A list with class "fastml_stability" containing:

importance_summary

Data frame with aggregated feature importance (mean, sd, se, lower/upper CI) across folds.

fold_importance

List of per-fold variable importance results.

rank_stability

Data frame showing how feature ranks vary across folds.

n_folds

Number of folds analyzed.

model_name

Name of the model analyzed.

Arguments

object

A fastml object trained with store_fold_models = TRUE.

model_name

Character string specifying which model to analyze. If NULL, uses the best model. Should match the format "algorithm (engine)", e.g., "rand_forest (ranger)".

vi_iterations

Integer. Number of permutations for variable importance per fold. Default is 10 for faster computation across many folds.

seed

Integer. Random seed for reproducibility.

plot

Logical. If TRUE (default), displays a stability plot showing mean importance with confidence intervals.

conf_level

Numeric. Confidence level for intervals. Default is 0.95.

Details

This function requires that the fastml model was trained with store_fold_models = TRUE, which stores the models fitted on each cross-validation fold. Without stored fold models, only the final best model is available, and cross-fold stability analysis is not possible.

The stability analysis computes permutation-based variable importance for each fold's model using DALEX, then aggregates across folds to show:

  • Mean importance and standard deviation

  • Confidence intervals for importance

  • Rank stability (how consistently features rank across folds)

Features with high mean importance but also high variance may be important for some data subsets but not others, suggesting potential instability in the model's reliance on those features.

Examples

Run this code
# \donttest{
# Train model with fold models stored
model <- fastml(
  data = iris,
  label = "Species",
  algorithms = "rand_forest",
  store_fold_models = TRUE
)

# Analyze stability
stability <- explain_stability(model)
print(stability)
# }

Run the code above in your browser using DataLab