tuneandtrainExtBoost: Tune and Train External Boosting

Description

This function tunes and trains a Boosting classifier using the mboost::glmboost function. It provides two strategies for tuning the number of boosting iterations (mstop) based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The mstop value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtBoost(
  data,
  dataext,
  estperf = FALSE,
  mstop_seq = seq(5, 1000, by = 5),
  nu = 0.1
)

Value

A list containing the following components:

best_mstop: The optimal number of boosting iterations determined during the tuning process.
best_model: The trained Boosting model using the selected mstop.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.

Arguments

data: A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
dataext: A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
estperf: A logical value indicating whether to use internal tuning with external validation (TRUE) or external tuning (FALSE). Default is FALSE.
mstop_seq: A numeric vector specifying the sequence of boosting iterations to evaluate. Default is seq(5, 1000, by = 5).
nu: A numeric value specifying the learning rate for boosting. Default is 0.1.

Examples

Run this code

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
mstop_seq <- seq(50, 500, by = 50)
result <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  mstop_seq = mstop_seq, nu = 0.1)
print(result$best_mstop)         # Optimal mstop
print(result$best_model)         # Trained Boosting model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  estperf = TRUE, mstop_seq = mstop_seq, nu = 0.1)
print(result_internal$best_mstop) # Optimal mstop
print(result_internal$best_model) # Trained Boosting model
print(result_internal$est_auc)    # AUC on external validation dataset

Run the code above in your browser using DataLab