Learn R Programming

fastml (version 0.7.0)

process_model: Process and Evaluate a Model Workflow

Description

This function processes a fitted model or a tuning result, finalizes the model if tuning was used, makes predictions on the test set, and computes performance metrics depending on the task type (classification or regression). It supports binary and multiclass classification, and handles probabilistic outputs when supported by the modeling engine.

Usage

process_model(
  model_obj,
  model_id,
  task,
  test_data,
  label,
  event_class,
  start_col = NULL,
  time_col = NULL,
  status_col = NULL,
  engine,
  train_data,
  metric,
  eval_times_user = NULL,
  bootstrap_ci = TRUE,
  bootstrap_samples = 500,
  bootstrap_seed = 1234,
  at_risk_threshold = 0.1
)

Value

A list with two elements:

performance

A tibble with computed performance metrics.

predictions

A tibble with predicted values and corresponding truth values, and probabilities (if applicable).

Arguments

model_obj

A fitted model or a tuning result (`tune_results` object).

model_id

A character identifier for the model (used in warnings).

task

Type of task, either `"classification"`, `"regression"`, or `"survival"`.

test_data

A data frame containing the test data.

label

The name of the outcome variable (as a character string).

event_class

For binary classification, specifies which class is considered the positive class: `"first"` or `"second"`.

start_col

Optional string. The name of the column specifying the start time in counting process (e.g., `(start, stop, event)`) survival data. Only used when task = "survival".

time_col

String. The name of the column specifying the event or censoring time (the "stop" time in counting process data). Only used when task = "survival".

status_col

String. The name of the column specifying the event status (e.g., 0 for censored, 1 for event). Only used when task = "survival".

engine

A character string indicating the model engine (e.g., `"xgboost"`, `"randomForest"`). Used to determine if class probabilities are supported. If `NULL`, probabilities are skipped.

train_data

A data frame containing the training data, required to refit finalized workflows.

metric

The name of the metric (e.g., `"roc_auc"`, `"accuracy"`, `"rmse"`) used for selecting the best tuning result.

eval_times_user

Optional numeric vector of time horizons at which to evaluate survival Brier scores. When `NULL`, sensible defaults based on the observed follow-up distribution are used.

bootstrap_ci

Logical; if `TRUE`, bootstrap confidence intervals are estimated for survival performance metrics.

bootstrap_samples

Integer giving the number of bootstrap resamples used when computing confidence intervals.

bootstrap_seed

Optional integer seed applied before bootstrap resampling to make interval estimates reproducible.

at_risk_threshold

Numeric value between 0 and 1 defining the minimum proportion of subjects required to remain at risk when determining the maximum follow-up time used in survival metrics.

Details

- If the input `model_obj` is a `tune_results` object, the function finalizes the model using the best hyperparameters according to the specified `metric`, and refits the model on the full training data.

- For classification tasks, performance metrics include accuracy, kappa, sensitivity, specificity, precision, F1-score, and ROC AUC (if probabilities are available).

- For regression tasks, RMSE, R-squared, and MAE are returned.

- For models with missing prediction lengths, a helpful imputation error is thrown to guide data preprocessing.