train_models: Train Specified Machine Learning Algorithms on the Training Data

Description

Trains specified machine learning algorithms on the preprocessed training data.

Usage

train_models(
  train_data,
  label,
  task,
  algorithms,
  resampling_method,
  folds,
  repeats,
  group_cols = NULL,
  block_col = NULL,
  block_size = NULL,
  initial_window = NULL,
  assess_window = NULL,
  skip = 0,
  outer_folds = NULL,
  resamples = NULL,
  tune_params,
  engine_params = list(),
  metric,
  summaryFunction = NULL,
  seed = 123,
  recipe,
  use_default_tuning = FALSE,
  tuning_strategy = "grid",
  tuning_iterations = 10,
  early_stopping = FALSE,
  adaptive = FALSE,
  algorithm_engines = NULL,
  event_class = "first",
  start_col = NULL,
  time_col = NULL,
  status_col = NULL,
  eval_times = NULL,
  at_risk_threshold = 0.1,
  audit_env = NULL
)

Value

A list of trained model objects.

Arguments

train_data: Preprocessed training data frame.
label: Name of the target variable.
task: Type of task: "classification", "regression", or "survival".
algorithms: Vector of algorithm names to train.
resampling_method: Resampling method for cross-validation. Supported options include standard "cv", "repeatedcv", and "boot", as well as grouped resampling via "grouped_cv", blocked/rolling schemes via "blocked_cv" or "rolling_origin", nested resampling via "nested_cv", and the passthrough "none" option.
folds: Number of folds for cross-validation.
repeats: Number of times to repeat cross-validation (only applicable for methods like "repeatedcv").
group_cols: Optional character vector of grouping columns used with `resampling_method = "grouped_cv"`. For classification problems the outcome column is used to request grouped stratification where supported; if class imbalance prevents stratification, grouped folds are still created and a warning is emitted to document the limitation.
block_col: Optional name of the ordering column used with blocked or rolling resampling.
block_size: Optional integer specifying the block size for `resampling_method = "blocked_cv"`.
initial_window: Optional integer specifying the initial window size for rolling resampling.
assess_window: Optional integer specifying the assessment window size for rolling resampling.
skip: Optional integer number of resamples to skip between rolling resamples.
outer_folds: Optional integer specifying the number of outer folds for `resampling_method = "nested_cv"`.
resamples: Optional rsample object. If provided, custom resampling splits will be used instead of those created internally.
tune_params: A named list of tuning ranges. For each algorithm, supply a list of engine-specific parameter values, e.g. list(rand_forest = list(ranger = list(mtry = c(1, 3)))).
engine_params: A named list of fixed engine-level arguments passed directly to the model fitting call for each algorithm/engine combination. Use this to control options like ties = "breslow" for Cox models or importance = "impurity" for ranger. Unlike tune_params, these values are not tuned over a grid.
metric: The performance metric to optimize.
summaryFunction: A custom summary function for model evaluation. Default is NULL.
seed: An integer value specifying the random seed for reproducibility.
recipe: A recipe object for preprocessing.
use_default_tuning: Logical; if TRUE and tune_params is NULL, tuning is performed using default grids. Tuning also occurs when custom tune_params are supplied. When FALSE and no custom parameters are given, the model is fitted once with default settings.
tuning_strategy: A string specifying the tuning strategy. Must be one of "grid", "bayes", or "none". Adaptive methods may be used with "grid". If "none" is selected, the workflow is fitted directly without tuning. If custom tune_params are supplied with tuning_strategy = "none", they will be ignored with a warning.
tuning_iterations: Number of iterations for Bayesian tuning. Ignored when tuning_strategy is not "bayes"; validation occurs only for the Bayesian strategy.
early_stopping: Logical for early stopping in Bayesian tuning.
adaptive: Logical indicating whether to use adaptive/racing methods.
algorithm_engines: A named list specifying the engine to use for each algorithm.
event_class: Character string identifying the positive class when computing classification metrics ("first" or "second").
start_col: Optional name of the survival start time column passed through to downstream evaluation helpers.
time_col: Optional name of the survival stop time column.
status_col: Optional name of the survival status/event column.
eval_times: Optional numeric vector of time horizons for survival metrics.
at_risk_threshold: Numeric cutoff used to determine the evaluation window for survival metrics within guarded resampling.
audit_env: Internal environment that tracks security audit findings when custom preprocessing hooks are executed. Typically supplied by fastml() and should be left as NULL when calling train_models() directly.