Learn R Programming

shapr

See the pkgdown site at norskregnesentral.github.io/shapr/ for a complete introduction with examples and documentation of the package.

For an overview of the methodology and capabilities of the package (per shapr v1.0.4), see the software paper Jullum et al. (2025), available as a preprint here.

NEWS

With shapr version 1.0.0 (GitHub only, Nov 2024) and version 1.0.1 (CRAN, Jan 2025), the package underwent a major update, providing a full restructuring of the code base, and a full suite of new functionality, including:

  • A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders and regression-based methods
  • Iterative Shapley value estimation with convergence detection
  • Parallelized computations with progress updates
  • Reweighted Kernel SHAP for faster convergence
  • New function explain_forecast() for explaining forecasts
  • Asymmetric and causal Shapley values
  • Several other methodological, computational and user-experience improvements
  • Python wrapper shaprpy making the core functionality of shapr available in Python

See the NEWS for a complete list.

Coming from shapr < 1.0.0?

shapr version >= 1.0.0 comes with a number of breaking changes. Most notably, we moved from using two functions (shapr() and explain()) to one function (explain()). In addition, custom models are now explained by passing the prediction function directly to explain(). Several input arguments were renamed, and a few functions for edge cases were removed to simplify the code base.

Click here to view a version of this README with the old syntax (v0.2.2).

Python wrapper

We provide a Python wrapper (shaprpy) which allows explaining Python models with the methodology implemented in shapr, directly from Python. The wrapper calls R internally and therefore requires an installation of R. See here for installation instructions and examples.

The package

The shapr R package implements an enhanced version of the Kernel SHAP method for approximating Shapley values, with a strong focus on conditional Shapley values. The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies, and distributions. The package also includes evaluation metrics to compare various approaches. With features like parallelized computations, convergence detection, progress updates, and extensive plotting options, shapr is a highly efficient and user-friendly tool, delivering precise estimates of conditional Shapley values, which are critical for understanding how features truly contribute to predictions.

A basic example is provided below. Otherwise, we refer to the pkgdown website and the vignettes there for details and further examples.

Installation

shapr is available on CRAN and can be installed in R as:

install.packages("shapr")

To install the development version of shapr, available on GitHub, use

remotes::install_github("NorskRegnesentral/shapr")

To also install all dependencies, use

remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)

Example

shapr supports computation of Shapley values with any predictive model that takes a set of numeric features and produces a numeric outcome.

The following example shows how a simple xgboost model is trained using the airquality dataset, and how shapr explains the individual predictions.

We first enable parallel computation and progress updates with the following code chunk. These are optional, but recommended for improved performance and user-friendliness, particularly for problems with many features.

# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S) computations
# Requires the progressr package
progressr::handlers(global = TRUE)
progressr::handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)

Here is the actual example:

library(xgboost)
library(shapr)

data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]

x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"

ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]

# Look at the dependence between the features
cor(x_train)
#>            Solar.R       Wind       Temp      Month
#> Solar.R  1.0000000 -0.1243826  0.3333554 -0.0710397
#> Wind    -0.1243826  1.0000000 -0.5152133 -0.2013740
#> Temp     0.3333554 -0.5152133  1.0000000  0.3400084
#> Month   -0.0710397 -0.2013740  0.3400084  1.0000000

# Fit a basic xgboost model to the training data
model <- xgboost(
  data = as.matrix(x_train),
  label = y_train,
  nround = 20,
  verbose = FALSE
)

# Specify phi_0, i.e., the expected prediction without any features
p0 <- mean(y_train)

# Compute Shapley values with Kernel SHAP, accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
  model = model,
  x_explain = x_explain,
  x_train = x_train,
  approach = "empirical",
  phi0 = p0,
  seed = 1
)
#> 
#> ── Starting `shapr::explain()` at 2025-08-20 15:08:39 ─────────────────────
#> ℹ Feature classes extracted from the model contains `NA`.
#>   Assuming feature classes from the data are correct.
#> ℹ `max_n_coalitions` is `NULL` or larger than or `2^n_features = 16`, and
#>   is therefore set to `2^n_features = 16`.
#> 
#> 
#> ── Explanation overview ──
#> 
#> 
#> 
#> • Model class: <xgb.Booster>
#> 
#> • v(S) estimation class: Monte Carlo integration
#> 
#> • Approach: empirical
#> 
#> • Procedure: Non-iterative
#> 
#> • Number of Monte Carlo integration samples: 1000
#> 
#> • Number of feature-wise Shapley values: 4
#> 
#> • Number of observations to explain: 6
#> 
#> • Computations (temporary) saved at:
#> '/tmp/RtmpnBYv2R/shapr_obj_2aa833a1e2267.rds'
#> 
#> 
#> 
#> ── Main computation started ──
#> 
#> 
#> 
#> ℹ Using 16 of 16 coalitions.

# Print the Shapley values for the observations to explain.
print(explanation)
#>    explain_id  none Solar.R  Wind  Temp  Month
#>         <int> <num>   <num> <num> <num>  <num>
#> 1:          1  43.1  13.212  4.79 -25.6  -5.60
#> 2:          2  43.1  -9.973  5.83 -11.0  -7.83
#> 3:          3  43.1  -2.292 -7.05 -10.2  -4.45
#> 4:          4  43.1   3.325 -3.24 -10.2  -6.66
#> 5:          5  43.1   4.304 -2.63 -14.2 -12.27
#> 6:          6  43.1   0.479 -5.25 -12.6  -6.65

# Provide a formatted summary of the shapr object
summary(explanation)
#> 
#> ── Summary of Shapley value explanation ───────────────────────────────────
#> • Computed with`shapr::explain()` in 2.2 seconds, started 2025-08-20
#> 15:08:39
#> • Model class: <xgb.Booster>
#> • v(S) estimation class: Monte Carlo integration
#> • Approach: empirical
#> • Procedure: Non-iterative
#> • Number of Monte Carlo integration samples: 1000
#> • Number of feature-wise Shapley values: 4
#> • Number of observations to explain: 6
#> • Number of coalitions used: 16 (of total 16)
#> • Computations (temporary) saved at:
#> '/tmp/RtmpnBYv2R/shapr_obj_2aa833a1e2267.rds'
#> 
#> ── Estimated Shapley values 
#>    explain_id   none Solar.R   Wind   Temp  Month
#>         <int> <char>  <char> <char> <char> <char>
#> 1:          1  43.09   13.21   4.79 -25.57  -5.60
#> 2:          2  43.09   -9.97   5.83 -11.04  -7.83
#> 3:          3  43.09   -2.29  -7.05 -10.15  -4.45
#> 4:          4  43.09    3.33  -3.24 -10.22  -6.66
#> 5:          5  43.09    4.30  -2.63 -14.15 -12.27
#> 6:          6  43.09    0.48  -5.25 -12.55  -6.65
#> ── Estimated MSEv 
#> Estimated MSE of v(S) = 144 (with sd = 64)

# Finally, we plot the resulting explanations
plot(explanation)

See Jullum et al. (2025) (preprint available here) for a software paper with an overview of the methodology and capabilities of the package (as of v1.0.4). See the general usage vignette for further basic usage examples and brief introductions to the methodology. For more thorough information about the underlying methodology, see methodological papers Aas, Jullum, and Løland (2021), Redelmeier, Jullum, and Aas (2020), Jullum, Redelmeier, and Aas (2021), Olsen et al. (2022), Olsen et al. (2024). See also Sellereite and Jullum (2019) for a very brief paper about a previous version (v0.1.1) of the package (with a different structure, syntax, and significantly less functionality).

Contribution

All feedback and suggestions are very welcome. Details on how to contribute can be found here. If you have any questions or comments, feel free to open an issue here.

Please note that the shapr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values.” Artificial Intelligence 298. https://doi.org/10.1016/j.artint.2021.103502.

Jullum, Martin, Lars Henry Berge Olsen, Jon Lachmann, and Annabelle Redelmeier. 2025. “Shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python.” arXiv Preprint arXiv:2504.01842. https://arxiv.org/abs/2504.01842.

Jullum, Martin, Annabelle Redelmeier, and Kjersti Aas. 2021. “Efficient and Simple Prediction Explanations with groupShapley: A Practical Perspective.” In Proceedings of the 2nd Italian Workshop on Explainable Artificial Intelligence, 28–43. CEUR Workshop Proceedings.

Olsen, Lars Henry Berge, Ingrid Kristine Glad, Martin Jullum, and Kjersti Aas. 2022. “Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features.” Journal of Machine Learning Research 23 (213): 1–51.

———. 2024. “A Comparative Study of Methods for Estimating Model-Agnostic Shapley Value Explanations.” Data Mining and Knowledge Discovery, 1–48.

Redelmeier, Annabelle, Martin Jullum, and Kjersti Aas. 2020. “Explaining Predictive Models with Mixed Features Using Shapley Values and Conditional Inference Trees.” In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 117–37. Springer.

Sellereite, N., and M. Jullum. 2019. “Shapr: An r-Package for Explaining Machine Learning Models with Dependence-Aware Shapley Values.” Journal of Open Source Software 5 (46): 2027. https://doi.org/10.21105/joss.02027.

Copy Link

Version

Install

install.packages('shapr')

Monthly Downloads

3,481

Version

1.0.6

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Martin Jullum

Last Published

November 17th, 2025

Functions in shapr (1.0.6)

aicc_full_single_cpp

Temp-function for computing the full AICc with several X's etc
check_categorical_valid_MCsamp

Check that all explicands has at least one valid MC sample in causal Shapley values
categorical_to_one_hot_layer

A torch::nn_module() Representing a categorical_to_one_hot_layer
check_verbose

Function that checks the verbose parameter
cli_iter

Print Messages in Iterative Procedure with CLI
compute_estimates

Compute the Shapley Values and Their Standard Deviation Given v(S)
check_groups

Check that the group parameter has the right form and content
compute_MSEv_eval_crit

Mean squared error of the contribution function v(S)
cli_compute_vS

Print Messages in Compute_vS with CLI
compute_vS

Compute v(S) for All Feature Subsets S
cli_startup

Print Startup Messages with CLI
create_marginal_data_cat

Create marginal categorical data for causal Shapley values
convert_feature_name_to_idx

Convert feature names into feature indices
create_ctree

Build all the conditional inference trees
default_doc_internal

Unexported documentation helper function.
default_doc_export

Exported documentation helper function.
explain_forecast

Explain a Forecast from Time Series Models with Dependence-Aware (Conditional/Observational) Shapley Values
compute_time

Gather and Compute the Timing of the Different Parts of the Explain Function
finalize_explanation

Gather the Final Output to Create the Explanation Object
compute_shapley

Compute Shapley Values
aicc_full_cpp

AICc formula for several sets, alternative definition
exact_coalition_table

Get table with all (exact) coalitions
get_data_specs

Fetches feature information from a given data set
explain

Explain the Output of Machine Learning Models with Dependence-Aware (Conditional/Observational) Shapley Values
additional_regression_setup

Additional Setup for Regression-Based Methods
gauss_cat_sampler_most_likely

A torch::nn_module() Representing a gauss_cat_sampler_most_likely
get_extra_parameters

This includes both extra parameters and other objects
format_info_basic

Internal function to extract a vector with formatted info about the shapr call
format_convergence_info

Internal function to extract formatted info about the (current) convergence state of the shapr call
hat_matrix_cpp

Computing single H matrix in AICc-function using the Mahalanobis distance
inv_gaussian_transform_cpp

Transforms new data to a standardized normal distribution
get_feature_specs

Get feature specifications from the model
lag_data

Lag a matrix of variables a specific number of lags for each variables.
mahalanobis_distance_cpp

(Generalized) Mahalanobis distance
get_extra_comp_args_default

Get the Default Values for the Extra Computation Arguments
create_marginal_data_gaussian

Generate marginal Gaussian data using Cholesky decomposition
format_info_extra

Internal function to extract some extra formatted info about the shapr call
plot_SV_several_approaches

Shapley Value Bar Plots for Several Explanation Objects
get_mu_vec

get_mu_vec
plot_MSEv_eval_crit

Plots of the MSEv Evaluation Criterion
get_model_specs

Fetches feature information from natively supported models
num_str

Convert a character to a numeric class
model_checker

Check that the type of model is supported by the native implementation of the model class
paired_sampler

Sampling Paired Observations
get_iterative_args_default

Function to specify arguments of the iterative estimation procedure
get_max_n_coalitions_causal

Get the number of coalitions that respects the causal ordering
observation_impute_cpp

Get imputed data
cli_topline

Create a header topline with cli
prepare_data_causal

Generate Data Used for Predictions and Monte Carlo Integration for Causal Shapley Values
quantile_type7_cpp

Compute the quantiles using quantile type seven
coalition_matrix_cpp

Get coalition matrix
prepare_data_copula_cpp

Generate (Gaussian) Copula MC samples
observation_impute

Generate permutations of training data using test observations
regression.check_vfold_cv_para

regression.cv_message

Produce message about which batch prepare_data is working on
gaussian_transform_separate

Transforms new data to standardized normal (dimension 1) based on other data transformations
reg_forecast_setup

Set up exogenous regressors for explanation in a forecast model.
sample_combinations

Helper function to sample a combination of training and testing rows, which does not risk getting the same observation twice. Need to improve this help file.
sample_coalition_table

Get table with sampled coalitions using the semi-deterministic sampling approach
prepare_next_iteration

Prepare the Next Iteration of the Iterative Sampling Algorithm
plot.shapr

Plot of the Shapley Value Explanations
create_marginal_data_training

Function that samples data from the empirical marginal training distribution
mcar_mask_generator

Missing Completely at Random (MCAR) Mask Generator
sample_coalitions_cpp_str_paired

We here return a vector of strings/characters, i.e., a CharacterVector, where each string is a space-separated list of integers.
sample_ctree

Sample ctree variables from a given conditional inference tree
get_cov_mat

get_cov_mat
get_data_forecast

Set up data for explain_forecast
format_round

Format numbers with rounding
shapley_weights

Calculate Shapley weight
print.shapr

Print Method for Shapr Objects
regression.check_namespaces

Check that needed libraries are installed
regression.check_parameters

Check regression parameters
shapley_setup

Set Up the KernelSHAP Framework
setup_approach

Set up the framework for the chosen approach
shapr-package

shapr: Prediction Explanation with Dependence-Aware Shapley Values
gauss_cat_parameters

A torch::nn_module() Representing a gauss_cat_parameters
predict_model

Generate predictions for input data with specified model
summary.shapr

Summary Method for Shapr Objects
specified_prob_mask_generator

A torch::nn_module() Representing a specified_prob_mask_generator
correction_matrix_cpp

Correction term with trace_input in AICc formula
test_predict_model

Model testing function
testing_cleanup

Clean Out Certain Output Arguments to Allow Perfect Reproducibility of the Output
group_forecast_setup

Set up user provided groups for explanation in a forecast model.
gauss_cat_sampler_random

A torch::nn_module() Representing a gauss_cat_sampler_random
gaussian_transform

Transforms a sample to standardized normal distribution
regression.get_string_to_R

Convert the string into an R object
vaeac

Initializing a vaeac model
create_coalition_table

Define coalitions, and fetch additional information about each unique coalition
get_output_args_default

Get the Default Values for the Output Arguments
skip_connection

A torch::nn_module() Representing a skip connection
get_valid_causal_coalitions

Get all coalitions satisfying the causal ordering
prepare_data_gaussian_cpp

Generate Gaussian MC samples
vaeac_categorical_parse_params

Creates Categorical Distributions
release_questions

Auxiliary function for the vignettes
get_nice_time

Reformat seconds into a human-readable format.
format_shapley_info

Internal function to extract the formatted Shapley value table
prepare_data

Generate data used for predictions and Monte Carlo integration
prepare_data_copula_cpp_caus

Generate (Gaussian) Copula MC samples for the causal setup with a single MC sample for each explicand
get_supported_approaches

Get the Implemented Approaches
get_supported_models

Provide a data.table with the Supported Models
get_S_causal_steps

Get the steps for generating MC samples for coalitions following a causal ordering
gauss_cat_loss

A torch::nn_module() Representing a gauss_cat_loss
vaeac_check_epoch_values

Function that checks provided epoch arguments
prepare_data_gaussian_cpp_caus

Generate Gaussian MC samples for the causal setup with a single MC sample for each explicand
get_predict_model

Get predict_model function
vaeac_check_logicals

Function that checks logicals
get_results

Extract Components from a Shapr Object
setup

Check Setup
prepare_data_single_coalition

Compute the conditional probabilities for a single coalition for the categorical approach
round_manual

Round numbers to the specified number of decimal places
memory_layer

A torch::nn_module() Representing a Memory Layer
vaeac_check_mask_gen

Function that checks the specified masking scheme
vaeac_normal_parse_params

Creates Normal Distributions
vaeac_check_probabilities

Function that checks probabilities
vaeac_check_save_names

Function that checks that the save folder exists and for a valid file name
plot_vaeac_eval_crit

Plot the training VLB and validation IWAE for vaeac models
plot_vaeac_imputed_ggpairs

Plot Pairwise Plots for Imputed and True Data
print_iter

Print Iterative Information
vaeac_get_current_save_state

Function that extracts additional objects from the environment into the state list
vaeac_get_n_decimals

Function to get string of values with specific number of decimals
vaeac_get_save_file_names

Function that creates the save file names for the vaeac model
regression.train_model

Train a Tidymodels Model via Workflows
regression.get_tune

Get if model is to be tuned
vaeac_check_positive_integers

Function that checks positive integers
vaeac_impute_missing_entries

Impute Missing Values Using vaeac
vaeac_get_model_from_checkp

Function to load a vaeac model and set it in the right state and mode
vaeac_get_extra_para_default

Specify the Extra Parameters in the vaeac Model
vaeac_check_positive_numerics

Function that checks positive numerics
vaeac_normalize_data

Normalize mixed data for vaeac
vaeac_update_pretrained_model

Function that checks and adds a pre-trained vaeac model
vaeac_kl_normal_normal

Compute the KL Divergence Between Two Gaussian Distributions.
vaeac_get_evaluation_criteria

Extract the Training VLB and Validation IWAE from a List of Explanations Objects Using the vaeac Approach
vaeac_train_model_continue

Continue to Train the vaeac Model
vaeac_update_para_locations

Move vaeac parameters to correct location
vaeac_get_optimizer

Function to create the optimizer used to train vaeac
regression.check_sur_n_comb

Check the regression.surrogate_n_comb parameter
process_factor_data

Treat factors as numeric values
vaeac_postprocess_data

Postprocess Data Generated by a vaeac Model
regression.get_y_hat

Get the predicted responses
regression.check_recipe_func

Check regression.recipe_func
rss_cpp

Function for computing sigma_hat_sq
save_results

Save the Intermediate Results to Disk
weight_matrix_cpp

Calculate weight matrix
regression.surrogate_aug_data

Augment the training data and the explicands
weight_matrix

Calculate Weighted Matrix
vaeac_check_which_vaeac_model

Function that checks for valid vaeac model name
vaeac_dataset

Dataset used by the vaeac model
vaeac_check_extra_named_list

Check vaeac.extra_parameters list
vaeac_check_masking_ratio

Function that checks that the masking ratio argument is valid
vaeac_train_model

Train the vaeac Model
vaeac_extend_batch

Extends Incomplete Batches by Sampling Extra Data from Dataloader
vaeac_train_model_auxiliary

Function used to train a vaeac model
vaeac_check_parameters

Function that calls all vaeac parameters check functions
vaeac_get_full_state_list

Function that extracts the state list objects from the environment
vaeac_get_x_explain_extended

Function to extend the explicands and apply all relevant masks/coalitions
vaeac_preprocess_data

Preprocess Data for the vaeac approach
vaeac_check_save_parameters

Function that gives a warning about disk usage
vaeac_get_mask_generator_name

Function that determines which mask generator to use
vaeac_get_val_iwae

Compute the Importance Sampling Estimator (Validation Error)
specified_masks_mask_generator

A torch::nn_module() Representing a specified_masks_mask_generator
vaeac_check_cuda

Function that checks for access to CUDA
vaeac_print_train_summary

Function to printout a training summary for the vaeac model
vaeac_check_x_colnames

Function that checks the feature names of data and vaeac model
vaeac_get_data_objects

Function to set up data loaders and save file names
vaeac_check_activation_func

Function that checks the provided activation function
vaeac_compute_normalization

Compute Featurewise Means and Standard Deviations
vaeac_save_state

Function that saves the state list and the current save state of the vaeac model
append_vS_list

Append the New vS_list to the Previous vS_list
check_convergence

Check the Convergence According to the Convergence Threshold