Evaluates the trained models on the test data and computes performance metrics.
fastml_compute_holdout_results(
models,
train_data,
test_data,
label,
start_col = NULL,
time_col = NULL,
status_col = NULL,
task,
metric = NULL,
event_class,
class_threshold = "auto",
eval_times = NULL,
bootstrap_ci = TRUE,
bootstrap_samples = 500,
bootstrap_seed = 1234,
at_risk_threshold = 0.1,
survival_metric_convention = "fastml",
precomputed_predictions = NULL,
summaryFunction = NULL,
multiclass_auc = "macro"
)A list with two elements:
A named list of performance metric tibbles for each model.
A named list of data frames with columns including truth, predictions, and probabilities per model.
A list of trained model objects.
Preprocessed training data frame.
Preprocessed test data frame.
Name of the target variable. For survival analysis this should be a character vector of length two giving the names of the time and status columns.
Optional string. The name of the column specifying the
start time in counting process (e.g., `(start, stop, event)`) survival
data. Only used when task = "survival".
String. The name of the column specifying the event or
censoring time (the "stop" time in counting process data). Only used
when task = "survival".
String. The name of the column specifying the event
status (e.g., 0 for censored, 1 for event). Only used when
task = "survival".
Type of task: "classification", "regression", or "survival".
The performance metric to optimize (e.g., "accuracy", "rmse").
A single string. Either "first" or "second" to specify which level of truth to consider as the "event".
For binary classification, controls how class probabilities are converted into hard class predictions. Numeric values in (0, 1) set a fixed threshold. The default `"auto"` tunes a threshold on the training data to maximize F1; use `"model"` to keep the model's default threshold.
Optional numeric vector of evaluation horizons for survival
metrics. Passed through to process_model.
Logical indicating whether bootstrap confidence intervals should be computed for the evaluation metrics.
Number of bootstrap resamples used when
bootstrap_ci = TRUE.
Optional integer seed for the bootstrap procedure used in metric estimation.
Minimum proportion of subjects that must remain at risk to define \(t_{max}\) when computing survival metrics such as the integrated Brier score.
Character string specifying which survival metric conventions to follow. `"fastml"` (default) uses fastml's internal defaults for evaluation horizons and t_max. `"tidymodels"` uses `eval_times` as the explicit evaluation grid and applies yardstick-style Brier/IBS normalization; when `eval_times` is `NULL`, time-dependent Brier metrics are omitted.
Optional data frame or nested list of previously generated predictions (per algorithm/engine) to reuse instead of recomputing. This is mainly used when combining results across engines.
Optional custom classification metric function passed
through to process_model for holdout evaluation.
For multiclass ROC AUC, the averaging method to use: `"macro"` (default, tidymodels) or `"macro_weighted"`. Macro weights each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data.