h2o (version 3.44.0.3)

h2o.shap_summary_plot: SHAP Summary Plot

Description

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Usage

h2o.shap_summary_plot(
  model,
  newdata,
  columns = NULL,
  top_n_features = 20,
  sample_size = 1000,
  background_frame = NULL
)

Value

A ggplot2 object

Arguments

model

An H2O tree-based model. This includes Random Forest, GBM and XGboost only. Must be a binary classification or regression model.

newdata

An H2O Frame, used to determine feature contributions.

columns

List of columns or list of indices of columns to show. If specified, then the top_n_features parameter will be ignored.

top_n_features

Integer specifying the maximum number of columns to show (ranked by variable importance).

sample_size

Integer specifying the maximum number of observations to be plotted.

background_frame

Optional frame, that is used as the source of baselines for the marginal SHAP.

Examples

Run this code
if (FALSE) {
library(h2o)
h2o.init()

# Import the wine dataset into H2O:
f <- "https://h2o-public-test-data.s3.amazonaws.com/smalldata/wine/winequality-redwhite-no-BOM.csv"
df <-  h2o.importFile(f)

# Set the response
response <- "quality"

# Split the dataset into a train and test set:
splits <- h2o.splitFrame(df, ratios = 0.8, seed = 1)
train <- splits[[1]]
test <- splits[[2]]

# Build and train the model:
gbm <- h2o.gbm(y = response,
               training_frame = train)

# Create the SHAP summary plot
shap_summary_plot <- h2o.shap_summary_plot(gbm, test)
print(shap_summary_plot)
}

Run the code above in your browser using DataLab