h2o.varimp_heatmap: Variable Importance Heatmap across multiple models

Description

Variable importance heatmap shows variable importance across multiple models. Some models in H2O return variable importance for one-hot (binary indicator) encoded versions of categorical columns (e.g. Deep Learning, XGBoost). In order for the variable importance of categorical columns to be compared across all model types we compute a summarization of the the variable importance across all one-hot encoded features and return a single variable importance for the original categorical feature. By default, the models and variables are ordered by their similarity.

Usage

h2o.varimp_heatmap(object, top_n = 20)

Arguments

object

An H2OAutoML object or list of H2O models.

top_n

Integer specifying the number models shown in the heatmap (based on leaderboard ranking). Defaults to 20.

Value

A ggplot2 object.

Examples

Run this code

# NOT RUN {
library(h2o)
h2o.init()

# Import the wine dataset into H2O:
f <- "https://h2o-public-test-data.s3.amazonaws.com/smalldata/wine/winequality-redwhite-no-BOM.csv"
df <-  h2o.importFile(f)

# Set the response
response <- "quality"

# Split the dataset into a train and test set:
splits <- h2o.splitFrame(df, ratios = 0.8, seed = 1)
train <- splits[[1]]
test <- splits[[2]]

# Build and train the model:
aml <- h2o.automl(y = response,
                  training_frame = train,
                  max_models = 10,
                  seed = 1)

# Create the variable importance heatmap
varimp_heatmap <- h2o.varimp_heatmap(aml)
print(varimp_heatmap)
# }

Run the code above in your browser using DataLab