Learn R Programming

MLwrap (version 0.1.0)

sensitivity_analysis: Perform Sensitivity Analysis and Interpretable ML methods

Description

As the final step in the MLwrap package workflow, this function performs Sensitivity Analysis (SA) on a fitted ML model stored in an analysis_object (in the examples, e.g., tidy_object). It evaluates the importance of features using various methods such as Permutation Feature Importance (PFI), SHAP (SHapley Additive exPlanations), Integrated Gradients, Olden sensitivity analysis, and Sobol indices. The function generates numerical results and visualizations (e.g., bar plots, box plots, beeswarm plots) to help interpret the impact of each feature on the model's predictions for both regression and classification tasks, providing critical insights after model training and evaluation.

Usage

sensitivity_analysis(analysis_object, methods = c("PFI"), metric = NULL)

Value

An updated analysis_object with the results of the sensitivity analysis stored in the sensitivity_analysis slot as a list. Each method's results are accessible under named elements (e.g., sensitivity_analysis[["PFI"]]). Additionally, the function produces various plots (bar plots, box plots, beeswarm plots) for visual interpretation of feature importance, tailored to the task type and number of outcome levels, completing the MLwrap workflow with actionable model insights.

Arguments

analysis_object

analysis_object created from fine_tuning function.

methods

Method to be used. A string of the method name: "PFI" (Permutation Feature Importance), "SHAP" (SHapley Additive exPlanations), "Integrated Gradients" (Neural Network only), "Olden" (Neural Network only), "Sobol_Jansen" (only when all input features are continuous).

metric

Metric used for "PFI" method (Permutation Feature Importance). A string of the name of metric (see Metrics).

Details

Following the steps of data preprocessing, model fitting, and performance assessment in the MLwrap pipeline, sensitivity_analysis() processes the training and test data using the preprocessing recipe stored in the analysis_object, applies the specified SA methods, and stores the results within the analysis_object. It supports different metrics for evaluation and handles multi-class classification by producing class-specific analyses and plots, ensuring a comprehensive understanding of model behavior (Iooss & Lemaître, 2015).

As the concluding phase of the MLwrap workflow—after data preparation, model training, and evaluation—this function enables users to interpret their models by quantifying and visualizing feature importance. It first validates the input arguments using check_args_sensitivity_analysis(). Then, it preprocesses the training and test data using the recipe stored in analysis_object$transformer. Depending on the specified methods, it calculates feature importance using:

  • PFI (Permutation Feature Importance): Assesses importance by shuffling feature values and measuring the change in model performance (using the specified or default metric).

  • SHAP (SHapley Additive exPlanations): Computes SHAP values to explain individual predictions by attributing contributions to each feature.

  • Integrated Gradients: Evaluates feature importance by integrating gradients of the model's output with respect to input features.

  • Olden: Calculates sensitivity based on connection weights, typically for neural network models, to determine feature contributions.

  • Sobol_Jansen: Performs variance-based global sensitivity analysis by decomposing the model output variance into contributions from individual features and their interactions, quantifying how much each feature and combination of features accounts for the variability in predictions. Only for continuous outcomes, not for categorical. Specifically, estimates first-order and total-order Sobol' sensitivity indices simultaneously using the Jansen (1999) Monte Carlo estimator.

For classification tasks with more than two outcome levels, the function generates separate results and plots for each class. Visualizations include bar plots for importance metrics, box plots for distribution of values, and beeswarm plots for detailed feature impact across observations. All results are stored in the analysis_object under the sensitivity_analysis slot, finalizing the MLwrap pipeline with a deep understanding of model drivers.

References

Iooss, B., & Lemaître, P. (2015). A review on global sensitivity analysis methods. In C. Meloni & G. Dellino (Eds.), Uncertainty Management in Simulation-Optimization of Complex Systems: Algorithms and Applications (pp. 101-122). Springer. https://doi.org/10.1007/978-1-4899-7547-8_5

Jansen, M. J. W. (1999). Analysis of variance designs for model output. Computer Physics Communications, 117(1-2), 35–43. https://doi.org/10.1016/S0010-4655(98)00154-4

Examples

Run this code
# Example: Using PFI and SHAP

library(MLwrap)

data(sim_data) # sim_data is a simulated dataset with psychological variables

wrap_object <- preprocessing(
       df = sim_data,
       formula = psych_well ~ depression + emot_intel + resilience + life_sat,
       task = "regression"
       )

wrap_object <- build_model(
               analysis_object = wrap_object,
               model_name = "Random Forest",
               hyperparameters = list(
                                 mtry = 3,
                                 trees = 20
                                 )
                           )

wrap_object <- fine_tuning(wrap_object,
                tuner = "Grid Search CV",
                metrics = c("rmse")
                )


wrap_object <- sensitivity_analysis(wrap_object, methods = "SHAP")

# Extracting Results

table_shap <- table_shap_results(wrap_object)

# Plotting SHAP Results

wrap_object %>%
    plot_shap()

Run the code above in your browser using DataLab