Learn R Programming

HMDA (version 0.2.0)

hmda.plot.metrics: Plot model performance metrics across a grid of models

Description

Creates a line plot comparing multiple (maximize-type) performance metrics across a set of models. The input data frame is typically the output of hmda.grid.analysis() and must contain a model_ids column and one or more numeric metric columns (e.g., aucpr, mcc, f2).

The function can either plot the first top_models rows (criteria = "top_models") or include all models that achieve at least distance_percentage times the best value for at least one metric (criteria = "distance_percentage").

Usage

hmda.plot.metrics(
  df,
  metrics = c("auc", "aucpr", "r2", "mcc", "f2"),
  criteria = "distance_percentage",
  top_models = 100,
  distance_percentage = 0.95,
  plot = TRUE,
  title = NULL
)

Arguments

df

A data frame of class "hmda.grid.analysis" containing a column model_ids and numeric metric columns.

metrics

Character vector of column names in df to be plotted.

criteria

Character. One of "top_models" or "distance_percentage" (default).

top_models

Integer. Number of top rows to plot when criteria = "top_models".

distance_percentage

Numeric in (0, 1]. When criteria = "distance_percentage", includes models with metric values \(\ge\) best(metric) * distance_percentage for at least one metric.

plot

Logical. If TRUE, prints the plot.

title

Character. Add title to the plot.

Author

E. F. Haghish

Examples

Run this code
if (FALSE) {
  # Example: Create a hyperparameter grid for GBM models.
  predictors <- c("var1", "var2", "var3")
  response <- "target"

  # Define hyperparameter ranges
  hyper_params <- list(
    ntrees = seq(50, 150, by = 25),
    max_depth = c(5, 10, 15),
    learn_rate = c(0.01, 0.05, 0.1),
    sample_rate = c(0.8, 1.0),
    col_sample_rate = c(0.8, 1.0)
  )

  # Run the grid search
  grid <- hmda.grid(
    algorithm = "gbm",
    x = predictors,
    y = response,
    training_frame = h2o.getFrame("hmda.train.hex"),
    hyper_params = hyper_params,
    nfolds = 10,
    stopping_metric = "AUTO"
  )

  # Assess the performances of the models
  grid_performance <- hmda.grid.analysis(grid)

  # plot the metrics of models that are within 95% of the best models
  # for each of the specified metrics
  hmda.plot.metrics(grid_performance,
                    criteria = "distance_percentage",
                    distance_percentage = 0.95,
                    metrics = c("auc", "aucpr", "r2", "mcc", "f2"))

}

Run the code above in your browser using DataLab