h2o.partialPlot: Partial Dependence Plots

Description

Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. Note: Unlike randomForest's partialPlot when plotting partial dependence the mean response (probabilities) is returned rather than the mean of the log class probability.

Usage

h2o.partialPlot(
  object,
  data,
  cols,
  destination_key,
  nbins = 20,
  plot = TRUE,
  plot_stddev = TRUE,
  weight_column = -1,
  include_na = FALSE,
  user_splits = NULL,
  col_pairs_2dpdp = NULL,
  save_to = NULL,
  row_index = -1
)

Arguments

object

An '>H2OModel object.

data

An H2OFrame object used for scoring and constructing the plot.

cols

Feature(s) for which partial dependence will be calculated.

destination_key

An key reference to the created partial dependence tables in H2O.

nbins

Number of bins used. For categorical columns make sure the number of bins exceeds the level count. If you enable add_missing_NA, the returned length will be nbin+1.

plot

A logical specifying whether to plot partial dependence table.

plot_stddev

A logical specifying whether to add std err to partial dependence plot.

weight_column

A string denoting which column of data should be used as the weight column.

include_na

A logical specifying whether missing value should be included in the Feature values.

user_splits

A two-level nested list containing user defined split points for pdp plots for each column. If there are two columns using user defined split points, there should be two lists in the nested list. Inside each list, the first element is the column name followed by values defined by the user.

col_pairs_2dpdp

A two-level nested list like this: col_pairs_2dpdp = list(c("col1_name", "col2_name"), c("col1_name","col3_name"), ...,) where a 2D partial plots will be generated for col1_name, col2_name pair, for col1_name, col3_name pair and whatever other pairs that are specified in the nested list.

save_to

Fully qualified prefix of the image files the resulting plots should be saved to, e.g. '/home/user/pdp'. Plots for each feature are saved separately in PNG format, each file receives a suffix equal to the corresponding feature name, e.g. `/home/user/pdp_AGE.png`. If the files already exists, they will be overridden. Files are only saves if plot = TRUE (default).

row_index

Row for which partial dependence will be calculated instead of the whole input frame.

Value

Plot and list of calculated mean response tables for each feature requested.

Examples

Run this code

# NOT RUN {
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate[, "CAPSULE"] <- as.factor(prostate[, "CAPSULE"] )
prostate[, "RACE"] <- as.factor(prostate[,"RACE"] )
prostate_gbm <- h2o.gbm(x = c("AGE","RACE"),
                        y = "CAPSULE",
                        training_frame = prostate,
                        ntrees = 10,
                        max_depth = 5,
                        learn_rate = 0.1)
h2o.partialPlot(object = prostate_gbm, data = prostate, cols = c("AGE", "RACE"))
# }

Run the code above in your browser using DataLab