Learn R Programming

BORG (version 0.2.5)

audit_importance: Audit Feature Importance Calculations

Description

Detects when feature importance (SHAP, permutation importance, etc.) is computed using test data, which can lead to biased feature selection and data leakage.

Usage

audit_importance(
  importance,
  data,
  train_idx,
  test_idx,
  method = "auto",
  model = NULL
)

Value

A BorgRisk object with audit results.

Arguments

importance

A vector, matrix, or data frame of importance values.

data

The data used to compute importance.

train_idx

Integer vector of training indices.

test_idx

Integer vector of test indices.

method

Character indicating the importance method. One of "shap", "permutation", "gain", "impurity", or "auto" (default).

model

Optional fitted model object for additional validation.

Details

Feature importance computed on test data is a form of data leakage because:

  • SHAP values computed on test data reveal test set structure

  • Permutation importance on test data uses test labels

  • Feature selection based on test importance leads to overfit models

This function checks if the data used for importance calculation includes test indices and flags potential violations.

Examples

Run this code
set.seed(42)
data <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100))
train_idx <- 1:70
test_idx <- 71:100

# Simulate importance values
importance <- c(x1 = 0.6, x2 = 0.4)

# Good: importance computed on training data
result <- audit_importance(importance, data[train_idx, ], train_idx, test_idx)

# Bad: importance computed on full data (includes test)
result_bad <- audit_importance(importance, data, train_idx, test_idx)

Run the code above in your browser using DataLab