Evaluates the trained models on the test data and computes performance metrics.
evaluate_models(
models,
train_data,
test_data,
label,
start_col,
time_col,
status_col,
task,
metric = NULL,
event_class,
eval_times = NULL,
bootstrap_ci = TRUE,
bootstrap_samples = 500,
bootstrap_seed = 1234,
at_risk_threshold = 0.1
)A list with two elements:
A named list of performance metric tibbles for each model.
A named list of data frames with columns including truth, predictions, and probabilities per model.
A list of trained model objects.
Preprocessed training data frame.
Preprocessed test data frame.
Name of the target variable. For survival analysis this should be a character vector of length two giving the names of the time and status columns.
Optional string. The name of the column specifying the
start time in counting process (e.g., `(start, stop, event)`) survival
data. Only used when task = "survival".
String. The name of the column specifying the event or
censoring time (the "stop" time in counting process data). Only used
when task = "survival".
String. The name of the column specifying the event
status (e.g., 0 for censored, 1 for event). Only used when
task = "survival".
Type of task: "classification", "regression", or "survival".
The performance metric to optimize (e.g., "accuracy", "rmse").
A single string. Either "first" or "second" to specify which level of truth to consider as the "event".
Optional numeric vector of evaluation horizons for survival
metrics. Passed through to process_model.
Logical indicating whether bootstrap confidence intervals should be computed for the evaluation metrics.
Number of bootstrap resamples used when
bootstrap_ci = TRUE.
Optional integer seed for the bootstrap procedure used in metric estimation.
Minimum proportion of subjects that must remain at risk to define \(t_{max}\) when computing survival metrics such as the integrated Brier score.