Finalizes a tuning result or utilizes an already fitted workflow to generate predictions on test data and compute performance metrics.
process_model(model_obj, model_id, task, test_data, label, event_class, engine)
A list with two components:
A data frame of performance metrics. For classification tasks, metrics include accuracy, kappa, sensitivity, specificity, precision, F-measure, and ROC AUC (when applicable). For regression tasks, metrics include RMSE, R-squared, and MAE.
A data frame containing the test data augmented with predicted classes and, when applicable, predicted probabilities.
A model object, which can be either a tuning result (an object inheriting from "tune_results"
) or an already fitted workflow.
A unique identifier for the model, used in warning messages if issues arise during processing.
A character string indicating the type of task, either "classification"
or "regression"
.
A data frame containing the test data on which predictions will be generated.
A character string specifying the name of the outcome variable in test_data
.
For classification tasks, a character string specifying which event class to consider as positive (accepted values: "first"
or "second"
).
A character string specifying the modeling engine used. This parameter affects prediction types and metric computations.
The function first checks if model_obj
is a tuning result. If so, it attempts to:
Select the best tuning parameters using tune::select_best
(note that the metric used for selection should be defined in the calling environment).
Extract the model specification and preprocessor from model_obj
using workflows::pull_workflow_spec
and workflows::pull_workflow_preprocessor
, respectively.
Finalize the model specification with the selected parameters via tune::finalize_model
.
Rebuild the workflow using workflows::workflow
, workflows::add_recipe
, and workflows::add_model
, and fit the finalized workflow with parsnip::fit
on training data (the variable train_data
is expected to be available in the environment).
If model_obj
is already a fitted workflow, it is used directly.
For classification tasks, the function makes class predictions (and probability predictions if engine
is not "LiblineaR"
) and computes performance metrics using functions from the yardstick
package. In binary classification, the positive class is determined based on the event_class
argument and ROC AUC is computed accordingly. For multiclass classification, macro-averaged metrics and ROC AUC (using weighted estimates) are calculated.
For regression tasks, the function predicts outcomes and computes regression metrics (RMSE, R-squared, and MAE).
If the number of predictions does not match the number of rows in test_data
, the function stops with an informative error message regarding missing values and imputation options.