Learn R Programming

shapr

See the pkgdown site at norskregnesentral.github.io/shapr/ for a complete introduction with examples and documentation of the package.

NEWS

With shapr version 1.0.0 (GitHub only, Nov 2024) and version 1.0.1 (CRAN, Jan 2025), the package was subject to a major update, providing a full restructuring of the code based, and a full suit of new functionality, including:

  • A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders, and regression-based methods
  • Iterative Shapley value estimation with convergence detection
  • Parallelized computations with progress updates
  • Reweighted Kernel SHAP for faster convergence
  • New function explain_forecast() for explaining forecasts
  • Several other methodological, computational and user-experience improvements
  • Python wrapper making the core functionality of shapr available in Python

See the NEWS for a complete list.

Coming from shapr < 1.0.0?

shapr version >= 1.0.0 comes with a number of breaking changes. Most notably, we moved from using two function (shapr() and explain()) to a single function (explain()). In addition, custom models are now explained by passing the prediction function directly to explain(), quite a few input arguments got new names, and a few functions for edge cases was removed to simplify the code base.

Click here to view a version of this README with old syntax (v0.2.2).

Python wrapper

We provide an (experimental) Python wrapper (shaprpy) which allows explaining Python models with the methodology implemented in shapr, directly from Python. The wrapper calls R internally, and therefore requires an installation of R. See here for installation instructions and examples.

The package

The shapr R package implements an enhanced version of the Kernel SHAP method, for approximating Shapley values, with a strong focus on conditional Shapley values. The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies, and distributions. The package also includes evaluation metrics to compare various approaches. With features like parallelized computations, convergence detection, progress updates, and extensive plotting options, shapr is as a highly efficient and user-friendly tool, delivering precise estimates of conditional Shapley values, which are critical for understanding how features truly contribute to predictions.

A basic example is provided below. Otherwise we refer to the pkgdown website and the different vignettes there for details and further examples.

Installation

shapr is available on CRAN and can be installed in R as:

install.packages("shapr")

To install the development version of shapr, available on GitHub, use

remotes::install_github("NorskRegnesentral/shapr")

To also install all dependencies, use

remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)

Example

shapr supports computation of Shapley values with any predictive model which takes a set of numeric features and produces a numeric outcome.

The following example shows how a simple xgboost model is trained using the airquality dataset, and how shapr explains the individual predictions.

We first enable parallel computation and progress updates with the following code chunk. These are optional, but recommended for improved performance and user friendliness, particularly for problems with many features.

# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S)-computations
# Requires the progressr package
progressr::handlers(global = TRUE)
progressr::handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)

Here comes the actual example

library(xgboost)
library(shapr)

data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]

x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"

ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]

# Looking at the dependence between the features
cor(x_train)
#>            Solar.R       Wind       Temp      Month
#> Solar.R  1.0000000 -0.1243826  0.3333554 -0.0710397
#> Wind    -0.1243826  1.0000000 -0.5152133 -0.2013740
#> Temp     0.3333554 -0.5152133  1.0000000  0.3400084
#> Month   -0.0710397 -0.2013740  0.3400084  1.0000000

# Fitting a basic xgboost model to the training data
model <- xgboost(
  data = as.matrix(x_train),
  label = y_train,
  nround = 20,
  verbose = FALSE
)

# Specifying the phi_0, i.e. the expected prediction without any features
p0 <- mean(y_train)

# Computing the Shapley values with kernelSHAP accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
  model = model,
  x_explain = x_explain,
  x_train = x_train,
  approach = "empirical",
  phi0 = p0
)
#> Note: Feature classes extracted from the model contains NA.
#> Assuming feature classes from the data are correct.
#> Success with message:
#> max_n_coalitions is NULL or larger than or 2^n_features = 16, 
#> and is therefore set to 2^n_features = 16.
#> 
#> ── Starting `shapr::explain()` at 2025-03-26 06:47:04 ──────────────────────────
#> • Model class: <xgb.Booster>
#> • Approach: empirical
#> • Iterative estimation: FALSE
#> • Number of feature-wise Shapley values: 4
#> • Number of observations to explain: 6
#> • Computations (temporary) saved at:
#> '/tmp/RtmpwFUqgs/shapr_obj_150f03bfd7b8.rds'
#> 
#> ── Main computation started ──
#> 
#> ℹ Using 16 of 16 coalitions.

# Printing the Shapley values for the test data.
# For more information about the interpretation of the values in the table, see ?shapr::explain.
print(explanation$shapley_values_est)
#>    explain_id     none    Solar.R      Wind      Temp      Month
#>         <int>    <num>      <num>     <num>     <num>      <num>
#> 1:          1 43.08571 13.2117337  4.785645 -25.57222  -5.599230
#> 2:          2 43.08571 -9.9727747  5.830694 -11.03873  -7.829954
#> 3:          3 43.08571 -2.2916185 -7.053393 -10.15035  -4.452481
#> 4:          4 43.08571  3.3254595 -3.240879 -10.22492  -6.663488
#> 5:          5 43.08571  4.3039571 -2.627764 -14.15166 -12.266855
#> 6:          6 43.08571  0.4786417 -5.248686 -12.55344  -6.645738

# Finally we plot the resulting explanations
plot(explanation)

See the general usage vignette for further basic usage examples and brief introductions to the methodology. For more thorough information about the underlying methodology, see Aas, Jullum, and Løland (2021), Redelmeier, Jullum, and Aas (2020), Jullum, Redelmeier, and Aas (2021), Olsen et al. (2022), Olsen et al. (2024) . See also Sellereite and Jullum (2019) for a brief paper about the previous (< 1.0.0) version of the package.

Contribution

All feedback and suggestions are very welcome. Details on how to contribute can be found here. If you have any questions or comments, feel free to open an issue here.

Please note that the ‘shapr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values.” Artificial Intelligence 298.

Jullum, Martin, Annabelle Redelmeier, and Kjersti Aas. 2021. “Efficient and Simple Prediction Explanations with groupShapley: A Practical Perspective.” In Proceedings of the 2nd Italian Workshop on Explainable Artificial Intelligence, 28–43. CEUR Workshop Proceedings.

Olsen, Lars Henry Berge, Ingrid Kristine Glad, Martin Jullum, and Kjersti Aas. 2022. “Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features.” Journal of Machine Learning Research 23 (213): 1–51.

———. 2024. “A Comparative Study of Methods for Estimating Model-Agnostic Shapley Value Explanations.” Data Mining and Knowledge Discovery, 1–48.

Redelmeier, Annabelle, Martin Jullum, and Kjersti Aas. 2020. “Explaining Predictive Models with Mixed Features Using Shapley Values and Conditional Inference Trees.” In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 117–37. Springer.

Sellereite, N., and M. Jullum. 2019. “Shapr: An r-Package for Explaining Machine Learning Models with Dependence-Aware Shapley Values.” Journal of Open Source Software 5 (46): 2027. https://doi.org/10.21105/joss.02027.

Copy Link

Version

Install

install.packages('shapr')

Monthly Downloads

2,303

Version

1.0.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Martin Jullum

Last Published

April 28th, 2025

Functions in shapr (1.0.4)

compute_shapley

Compute shapley values
coalition_matrix_cpp

Get coalition matrix
compute_estimates

Computes the the Shapley values and their standard deviation given the v(S)
cli_topline

Create a header topline with cli
compute_MSEv_eval_crit

Mean Squared Error of the Contribution Function v(S)
cli_iter

Printing messages in iterative procedure with cli
compute_vS

Computes v(S) for all features subsets S.
cli_startup

Printing startup messages with cli
convert_feature_name_to_idx

Convert feature names into feature indices
compute_time

Gathers and computes the timing of the different parts of the explain function.
create_coalition_table

Define coalitions, and fetch additional information about each unique coalition
correction_matrix_cpp

Correction term with trace_input in AICc formula
exact_coalition_table

Get table with all (exact) coalitions
create_marginal_data_gaussian

Generate marginal Gaussian data using Cholesky decomposition
create_ctree

Build all the conditional inference trees
create_marginal_data_cat

Create marginal categorical data for causal Shapley values
explain

Explain the output of machine learning models with dependence-aware (conditional/observational) Shapley values
create_marginal_data_training

Function that samples data from the empirical marginal training distribution
gauss_cat_loss

A torch::nn_module() Representing a gauss_cat_loss
default_doc_internal

Unexported documentation helper function.
default_doc_export

Exported documentation helper function.
gauss_cat_sampler_most_likely

A torch::nn_module() Representing a gauss_cat_sampler_most_likely
get_S_causal_steps

Get the steps for generating MC samples for coalitions following a causal ordering
get_cov_mat

get_cov_mat
gauss_cat_sampler_random

A torch::nn_module() Representing a gauss_cat_sampler_random
get_extra_parameters

This includes both extra parameters and other objects
get_extra_comp_args_default

Gets the default values for the extra computation arguments
explain_forecast

Explain a forecast from time series models with dependence-aware (conditional/observational) Shapley values
finalize_explanation

Gathers the final output to create the explanation object
group_forecast_setup

Set up user provided groups for explanation in a forecast model.
get_data_forecast

Set up data for explain_forecast
gauss_cat_parameters

A torch::nn_module() Representing a gauss_cat_parameters
hat_matrix_cpp

Computing single H matrix in AICc-function using the Mahalanobis distance
mcar_mask_generator

Missing Completely at Random (MCAR) Mask Generator
get_data_specs

Fetches feature information from a given data set
get_supported_models

Provides a data.table with the supported models
gaussian_transform_separate

Transforms new data to standardized normal (dimension 1) based on other data transformations
get_valid_causal_coalitions

Get all coalitions satisfying the causal ordering
get_output_args_default

Gets the default values for the output arguments
mahalanobis_distance_cpp

(Generalized) Mahalanobis distance
get_iterative_args_default

Function to specify arguments of the iterative estimation procedure
inv_gaussian_transform_cpp

Transforms new data to a standardized normal distribution
get_model_specs

Fetches feature information from natively supported models
get_mu_vec

get_mu_vec
lag_data

Lag a matrix of variables a specific number of lags for each variables.
get_feature_specs

Gets the feature specifications form the model
get_max_n_coalitions_causal

Get the number of coalitions that respects the causal ordering
gaussian_transform

Transforms a sample to standardized normal distribution
plot_vaeac_eval_crit

Plot the training VLB and validation IWAE for vaeac models
memory_layer

A torch::nn_module() Representing a Memory Layer
get_predict_model

Get predict_model function
get_supported_approaches

Gets the implemented approaches
observation_impute

Generate permutations of training data using test observations
plot_vaeac_imputed_ggpairs

Plot Pairwise Plots for Imputed and True Data
observation_impute_cpp

Get imputed data
prepare_data_copula_cpp_caus

Generate (Gaussian) Copula MC samples for the causal setup with a single MC sample for each explicand
print.shapr

Print method for shapr objects
model_checker

Check that the type of model is supported by the native implementation of the model class
prepare_data

Generate data used for predictions and Monte Carlo integration
prepare_next_iteration

Prepares the next iteration of the iterative sampling algorithm
predict_model

Generate predictions for input data with specified model
print_iter

Prints iterative information
plot_SV_several_approaches

Shapley value bar plots for several explanation objects
plot_MSEv_eval_crit

Plots of the MSEv Evaluation Criterion
prepare_data_gaussian_cpp

Generate Gaussian MC samples
plot.shapr

Plot of the Shapley value explanations
paired_sampler

Sampling Paired Observations
prepare_data_gaussian_cpp_caus

Generate Gaussian MC samples for the causal setup with a single MC sample for each explicand
prepare_data_copula_cpp

Generate (Gaussian) Copula MC samples
regression.get_string_to_R

Convert the string into an R object
regression.check_sur_n_comb

Check the regression.surrogate_n_comb parameter
prepare_data_causal

Generate data used for predictions and Monte Carlo integration for causal Shapley values
regression.get_tune

Get if model is to be tuned
regression.check_recipe_func

Check regression.recipe_func
release_questions

Auxiliary function for the vignettes
process_factor_data

Treat factors as numeric values
quantile_type7_cpp

Compute the quantiles using quantile type seven
regression.get_y_hat

Get the predicted responses
regression.check_parameters

Check regression parameters
regression.surrogate_aug_data

Augment the training data and the explicands
reg_forecast_setup

Set up exogenous regressors for explanation in a forecast model.
regression.check_namespaces

Check that needed libraries are installed
rss_cpp

Function for computing sigma_hat_sq
regression.train_model

Train a tidymodels model via workflows
sample_coalition_table

Get table with sampled coalitions using the semi-deterministic sampling approach
prepare_data_single_coalition

Compute the conditional probabilities for a single coalition for the categorical approach
regression.check_vfold_cv_para

regression.cv_message

Produce message about which batch prepare_data is working on
sample_coalitions_cpp_str_paired

We here return a vector of strings/characters, i.e., a CharacterVector, where each string is a space-separated list of integers.
sample_ctree

Sample ctree variables from a given conditional inference tree
sample_combinations

Helper function to sample a combination of training and testing rows, which does not risk getting the same observation twice. Need to improve this help file.
save_results

Saves the intermediate results to disk
specified_masks_mask_generator

A torch::nn_module() Representing a specified_masks_mask_generator
vaeac_check_logicals

Function that checks logicals
vaeac_categorical_parse_params

Creates Categorical Distributions
shapr-package

shapr: Prediction Explanation with Dependence-Aware Shapley Values
shapley_setup

Set up the kernelSHAP framework
vaeac

Initializing a vaeac model
specified_prob_mask_generator

A torch::nn_module() Representing a specified_prob_mask_generator
shapley_weights

Calculate Shapley weight
vaeac_check_extra_named_list

Check vaeac.extra_parameters list
skip_connection

A torch::nn_module() Representing a skip connection
test_predict_model

Model testing function
vaeac_check_epoch_values

Function that checks provided epoch arguments
setup_approach

Set up the framework for the chosen approach
setup

check_setup
vaeac_check_positive_integers

Function that checks positive integers
vaeac_check_cuda

Function that checks for access to CUDA
vaeac_check_mask_gen

Function that checks the specified masking scheme
vaeac_check_activation_func

Function that checks the provided activation function
vaeac_check_positive_numerics

Function that checks positive numerics
vaeac_dataset

Dataset used by the vaeac model
vaeac_check_which_vaeac_model

Function that checks for valid vaeac model name
vaeac_extend_batch

Extends Incomplete Batches by Sampling Extra Data from Dataloader
vaeac_check_save_parameters

Function that gives a warning about disk usage
testing_cleanup

Cleans out certain output arguments to allow perfect reproducibility of the output
vaeac_compute_normalization

Compute Featurewise Means and Standard Deviations
vaeac_check_x_colnames

Function that checks the feature names of data and vaeac model
vaeac_check_masking_ratio

Function that checks that the masking ratio argument is valid
vaeac_get_n_decimals

Function to get string of values with specific number of decimals
vaeac_get_model_from_checkp

Function to load a vaeac model and set it in the right state and mode
vaeac_check_parameters

Function that calls all vaeac parameters check functions
vaeac_kl_normal_normal

Compute the KL Divergence Between Two Gaussian Distributions.
vaeac_get_extra_para_default

Function to specify the extra parameters in the vaeac model
vaeac_impute_missing_entries

Impute Missing Values Using Vaeac
vaeac_normalize_data

Normalize mixed data for vaeac
vaeac_get_evaluation_criteria

Extract the Training VLB and Validation IWAE from a list of explanations objects using the vaeac approach
vaeac_normal_parse_params

Creates Normal Distributions
vaeac_get_current_save_state

Function that extracts additional objects from the environment to the state list
vaeac_get_x_explain_extended

Function to extend the explicands and apply all relevant masks/coalitions
vaeac_get_data_objects

Function to set up data loaders and save file names
vaeac_get_val_iwae

Compute the Importance Sampling Estimator (Validation Error)
vaeac_postprocess_data

Postprocess Data Generated by a vaeac Model
vaeac_check_probabilities

Function that checks probabilities
vaeac_get_full_state_list

Function that extracts the state list objects from the environment
vaeac_get_mask_generator_name

Function that determines which mask generator to use
vaeac_check_save_names

Function that checks that the save folder exists and for a valid file name
vaeac_get_save_file_names

Function that creates the save file names for the vaeac model
vaeac_get_optimizer

Function to create the optimizer used to train vaeac
vaeac_preprocess_data

Preprocess Data for the vaeac approach
vaeac_train_model_continue

Continue to Train the vaeac Model
vaeac_update_pretrained_model

Function that checks and adds a pre-trained vaeac model
vaeac_train_model_auxiliary

Function used to train a vaeac model
weight_matrix

Calculate weighted matrix
vaeac_train_model

Train the Vaeac Model
vaeac_update_para_locations

Move vaeac parameters to correct location
weight_matrix_cpp

Calculate weight matrix
vaeac_print_train_summary

Function to printout a training summary for the vaeac model
vaeac_save_state

Function that saves the state list and the current save state of the vaeac model
aicc_full_cpp

AICc formula for several sets, alternative definition
check_categorical_valid_MCsamp

Check that all explicands has at least one valid MC sample in causal Shapley values
additional_regression_setup

Additional setup for regression-based methods
check_convergence

Checks the convergence according to the convergence threshold
cli_compute_vS

Printing messages in compute_vS with cli
categorical_to_one_hot_layer

A torch::nn_module() Representing a categorical_to_one_hot_layer
check_groups

Check that the group parameter has the right form and content
aicc_full_single_cpp

Temp-function for computing the full AICc with several X's etc
append_vS_list

Appends the new vS_list to the prev vS_list
check_verbose

Function that checks the verbose parameter