evaluate_models: Evaluate Models Function

Description

Evaluates the trained models on the test data and computes performance metrics.

Usage

evaluate_models(
  models,
  train_data,
  test_data,
  label,
  start_col,
  time_col,
  status_col,
  task,
  metric = NULL,
  event_class,
  eval_times = NULL,
  bootstrap_ci = TRUE,
  bootstrap_samples = 500,
  bootstrap_seed = 1234,
  at_risk_threshold = 0.1
)

Value

A list with two elements:

performance: A named list of performance metric tibbles for each model.
predictions: A named list of data frames with columns including truth, predictions, and probabilities per model.

Arguments

models: A list of trained model objects.
train_data: Preprocessed training data frame.
test_data: Preprocessed test data frame.
label: Name of the target variable. For survival analysis this should be a character vector of length two giving the names of the time and status columns.
start_col: Optional string. The name of the column specifying the start time in counting process (e.g., `(start, stop, event)`) survival data. Only used when task = "survival".
time_col: String. The name of the column specifying the event or censoring time (the "stop" time in counting process data). Only used when task = "survival".
status_col: String. The name of the column specifying the event status (e.g., 0 for censored, 1 for event). Only used when task = "survival".
task: Type of task: "classification", "regression", or "survival".
metric: The performance metric to optimize (e.g., "accuracy", "rmse").
event_class: A single string. Either "first" or "second" to specify which level of truth to consider as the "event".
eval_times: Optional numeric vector of evaluation horizons for survival metrics. Passed through to process_model.
bootstrap_ci: Logical indicating whether bootstrap confidence intervals should be computed for the evaluation metrics.
bootstrap_samples: Number of bootstrap resamples used when bootstrap_ci = TRUE.
bootstrap_seed: Optional integer seed for the bootstrap procedure used in metric estimation.
at_risk_threshold: Minimum proportion of subjects that must remain at risk to define \(t_{max}\) when computing survival metrics such as the integrated Brier score.