Learn R Programming

HMDA (version 0.3.0)

hmda.plot.metrics: Plot model performance metrics across a grid of models

Description

Creates a line plot comparing multiple (maximize-type) performance metrics across a set of models. The input data frame is typically the output of hmda.grid.analysis and must contain a model_ids column and one or more numeric metric columns (e.g., aucpr, mcc, f2).

The function can either plot the first n_models rows (criteria = "n_models") or include all models that achieve at least tolerance times the best value for at least one metric (criteria = "rashomon").

Usage

hmda.plot.metrics(
  df,
  metrics = c("auc", "aucpr", "r2", "mcc", "f2"),
  criteria = "rashomon",
  n_models = 100,
  tolerance = 0.05,
  plot = TRUE,
  title = NULL
)

Arguments

df

A data frame of class "hmda.grid.analysis" containing a column model_ids and numeric metric columns.

metrics

Character vector of column names in df to be plotted.

criteria

Character. One of "n_models" or "rashomon" (default).

n_models

Integer. Number of top rows to plot when criteria = "n_models".

tolerance

Numeric in (0, 1). Alternative to n_models. Selects all models within a given percentage distance of the best value for each metric (direction-aware). You must specify either n_models or tolerance, not both. tolerance is direction-aware. For example, when metric is AUC, if the tolerance is set to 1%, it selects models that have AUC equal or lower than 99% of the model with the highest AUC.

plot

Logical. If TRUE, prints the plot.

title

Character. Add title to the plot.

Author

E. F. Haghish

Examples

Run this code
if (FALSE) {
  # Example: Create a hyperparameter grid for GBM models.
  predictors <- c("var1", "var2", "var3")
  response <- "target"

  # Define hyperparameter ranges
  hyper_params <- list(
    ntrees = seq(50, 150, by = 25),
    max_depth = c(5, 10, 15),
    learn_rate = c(0.01, 0.05, 0.1),
    sample_rate = c(0.8, 1.0),
    col_sample_rate = c(0.8, 1.0)
  )

  # Run the grid search
  grid <- hmda.grid(
    algorithm = "gbm",
    x = predictors,
    y = response,
    training_frame = h2o.getFrame("hmda.train.hex"),
    hyper_params = hyper_params,
    nfolds = 10,
    stopping_metric = "AUTO"
  )

  # Assess the performances of the models
  grid_performance <- hmda.grid.analysis(grid)

  # plot the metrics of models that are within 95\
  # for each of the specified metrics
  hmda.plot.metrics(grid_performance,
                    criteria = "rashomon",
                    tolerance = 0.95,
                    metrics = c("auc", "aucpr", "r2", "mcc", "f2"))

}

Run the code above in your browser using DataLab