fastexplain: FastExplain the fastml_model (DALEX + SHAP + Permutation-based VI)

Description

Provides model explainability using DALEX. This function:

Creates a DALEX explainer.
Computes permutation-based variable importance with boxplots showing variability, displays the table and plot.
Computes partial dependence-like model profiles if `features` are provided.
Computes Shapley values (SHAP) for a sample of the training observations, displays the SHAP table, and plots a summary bar chart of \(\text{mean}(\vert \text{SHAP value} \vert)\) per feature. For classification, it shows separate bars for each class.

Usage

fastexplain(
  object,
  method = "dalex",
  features = NULL,
  grid_size = 20,
  shap_sample = 5,
  vi_iterations = 10,
  seed = 123,
  loss_function = NULL,
  ...
)

Value

Prints DALEX explanations: variable importance table & plot, model profiles (if any), and SHAP table & summary plot.

Arguments

object

A fastml_model object.

method

Currently only "dalex" is supported.

features

Character vector of feature names for partial dependence (model profiles). Default NULL.

grid_size

Number of grid points for partial dependence. Default 20.

shap_sample

Integer number of observations from processed training data to compute SHAP values for. Default 5.

vi_iterations

Integer. Number of permutations for variable importance (B). Default 10.

seed

Integer. A value specifying the random seed.

loss_function

Function. The loss function for model_parts.

If NULL and task = 'classification', defaults to DALEX::loss_cross_entropy.
If NULL and task = 'regression', defaults to DALEX::loss_root_mean_square.

...

Additional arguments (not currently used).

Details

Custom number of permutations for VI (vi_iterations):

You can now specify how many permutations (B) to use for permutation-based variable importance. More permutations yield more stable estimates but take longer.
Better error messages and checks:

Improved checks and messages if certain packages or conditions are not met.
Loss Function:

A loss_function argument has been added to let you pick a different performance measure (e.g., loss_cross_entropy for classification, loss_root_mean_square for regression).
Parallelization Suggestion: