Learn R Programming

HMDA (version 0.2.0)

hmda.domain: Domain-level WMSHAP summary and plot

Description

#' Wrapper around shapley.domain to compute and visualize weighted mean SHAP ratios (WMSHAP) at the domain/group/factor level. Domains are user-defined clusters of feature names (e.g., latent factors or conceptual groups). The function aggregates feature-level contributions into domain-level contributions and returns a plot and confidence intervals.

Usage

hmda.domain(
  wmshap,
  domains,
  plot = "bar",
  print = FALSE,
  colorcode = NULL,
  xlab = "Factors"
)

Value

A ggplot object (invisibly returned) and, depending on print, prints the domain summary.

Arguments

wmshap

object of class 'shapley', as returned by the 'shapley' function

domains

character list, specifying the domains for grouping the features' contributions. Domains are clusters of features' names, that can be used to compute WMSHAP at higher level, along with their 95 better understand how a cluster of features influence the outcome. Note that either of 'features' or 'domains' arguments can be specified at the time.

plot

character, specifying the type of the plot, which can be either 'bar' or 'wmshap'. The default is 'bar'.

print

logical. if TRUE, the WMSHAP summary table for the given row is printed

colorcode

Character vector for specifying the color names for each domain in the plot.

xlab

Character label for WMSHAP domains or factors

Author

E. F. Haghish

Examples

Run this code
if (FALSE) {
  library(HMDA)
  library(h2o)
  hmda.init()

  # Import a sample binary outcome dataset into H2O
  train <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
  test <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

  # Identify predictors and response
  y <- "response"
  x <- setdiff(names(train), y)

  # For binary classification, response should be a factor
  train[, y] <- as.factor(train[, y])
  test[, y] <- as.factor(test[, y])

  params <- list(learn_rate = c(0.01, 0.1),
                 max_depth = c(3, 5, 9),
                 sample_rate = c(0.8, 1.0)
  )

  # Train and validate a cartesian grid of GBMs
  hmda_grid1 <- hmda.grid(algorithm = "gbm", x = x, y = y,
                          grid_id = "hmda_grid1",
                          training_frame = train,
                          nfolds = 10,
                          ntrees = 100,
                          seed = 1,
                          hyper_params = params)

  # compute weighted mean shap values
  wmshap <- hmda.wmshap(models = hmda_grid1,
                        newdata = test,
                        performance_metric = "aucpr",
                        standardize_performance_metric = FALSE,
                        performance_type = "xval",
                        minimum_performance = 0,
                        method = "mean",
                        cutoff = 0.01,
                        plot = TRUE)

  # define domains to combine their WMSHAP values
  # =============================================
  #
  # There are different ways to specify a cluster of features or even
  # a group of factors that touch on a broader domain. HMDA includes
  # exploratory factor analysis procedure to help with this process
  # (see ?hmda.efa function). Here, "assuming" that we have good reasons
  # to combine some of the features under some clusters:

  domains = list(Group1 = c("x22", "x18", "x14", "x1", "x10", "x4"),
                 Group2 = c("x25", "x23", "x6", "x27"),
                 Group3 = c("x28", "x26"))

  hmda.domain(wmshap = wmshap,
              plot = "bar",
              domains = domains,
              print = TRUE)
}

Run the code above in your browser using DataLab