Learn R Programming

HMDA (version 0.3.0)

hmda.row.plot: WMSHAP row-level plot for a single observation (participant or data row)

Description

Computes and visualizes Weighted Mean SHAP contributions (WMSHAP) for a single row (subject/observation) across multiple models in a shapley object. For each feature, the function computes a weighted mean of row-level SHAP contributions across models using shapley$weights and reports an approximate 95 interval summarizing variability across models.

Usage

hmda.row.plot(
  wmshap,
  row_index,
  top_n_features = NULL,
  features = NULL,
  nonzeroCI = FALSE,
  plot = TRUE,
  print = FALSE
)

Value

a list including the GGPLOT2 object and the data frame of WMSHAP summary values.

Arguments

wmshap

object of class 'shapley', as returned by the 'shapley' function or hmda.wmshap function

row_index

Integer (length 1). The row/subject identifier to visualize. This is matched against the index column in shapley$results.

top_n_features

Integer. If specified, the top n features with the highest weighted SHAP values will be selected. This will be overrulled by the 'features' argument.

features

Optional character vector of feature names to plot. If NULL, all available features in shapley$results are used. Specifying the features argument will override the top_n_features argument.

nonzeroCI

Logical. If TRUE, it avoids ploting features that have a confidence interval crossing zero.

plot

Logical. If TRUE, prints the plot.

print

Logical. If TRUE, prints the computed summary table for the row.

Author

E. F. Haghish

Examples

Run this code

if (FALSE) {
  library(HMDA)
  library(h2o)
  hmda.init()

  # Import a sample binary outcome dataset into H2O
  train <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
  test <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

  # Identify predictors and response
  y <- "response"
  x <- setdiff(names(train), y)

  # For binary classification, response should be a factor
  train[, y] <- as.factor(train[, y])
  test[, y] <- as.factor(test[, y])

  params <- list(learn_rate = c(0.01, 0.1),
                 max_depth = c(3, 5, 9),
                 sample_rate = c(0.8, 1.0)
  )

  # Train and validate a cartesian grid of GBMs
  hmda_grid1 <- hmda.grid(algorithm = "gbm", x = x, y = y,
                          grid_id = "hmda_grid1",
                          training_frame = train,
                          nfolds = 10,
                          ntrees = 100,
                          seed = 1,
                          hyper_params = params)

  # compute weighted mean shap values
  wmshap <- hmda.wmshap(models = hmda_grid1,
                        newdata = test,
                        performance_metric = "aucpr",
                        standardize_performance_metric = FALSE,
                        performance_type = "xval",
                        minimum_performance = 0,
                        method = "mean",
                        cutoff = 0.01,
                        plot = TRUE)

#######################################################
### PLOT THE WEIGHTED MEAN SHAP VALUES FOR A PARTICULAR CASE
#######################################################
hmda.row.plot(wmshap, row_index = 13)
}

Run the code above in your browser using DataLab