Learn R Programming

Coxmos (version 1.1.2)

cv.isb.splsdacox: Iterative SB.sPLS-DACOX-Dynamic Cross-Validation

Description

This function performs cross-validated sparse partial least squares single-block for sPLS-DACOX-Dynamic. It returns the optimal number of components and the optimal sparsity penalty value based on cross-validation. Performance can be evaluated using multiple metrics, such as Area Under the Curve (AUC), I. Brier Score, or C-Index. Users can also specify more than one metric simultaneously.

Usage

cv.isb.splsdacox(
  X,
  Y,
  max.ncomp = 8,
  vector = NULL,
  MIN_NVAR = 10,
  MAX_NVAR = NULL,
  n.cut_points = 5,
  MIN_AUC_INCREASE = 0.01,
  EVAL_METHOD = "AUC",
  n_run = 3,
  k_folds = 10,
  x.center = TRUE,
  x.scale = FALSE,
  remove_near_zero_variance = TRUE,
  remove_zero_variance = TRUE,
  toKeep.zv = NULL,
  remove_variance_at_fold_level = FALSE,
  remove_non_significant_models = FALSE,
  remove_non_significant = FALSE,
  alpha = 0.05,
  w_AIC = 0,
  w_C.Index = 0,
  w_AUC = 1,
  w_I.BRIER = 0,
  times = NULL,
  max_time_points = 15,
  MIN_AUC = 0.8,
  MIN_COMP_TO_CHECK = 3,
  pred.attr = "mean",
  pred.method = "cenROC",
  fast_mode = FALSE,
  max.iter = 200,
  MIN_EPV = 5,
  return_models = FALSE,
  returnData = FALSE,
  PARALLEL = FALSE,
  verbose = FALSE,
  seed = 123
)

Value

An instance of class "Coxmos" and model "cv.SB.sPLS-DACOX-Dynamic", containing:

  • best_model_info: Data frame with the best model's information.

  • df_results_folds: Data frame with fold-level results.

  • df_results_runs: Data frame with run-level results.

  • df_results_comps: Data frame with component-level results.

  • list_cv_spls_models: List of cross-validated models for each block.

  • opt.comp: Optimal number of components.

  • opt.nvar: Optimal number of variables selected.

  • class: Model class.

  • time: Time taken to run the cross-validation.

Arguments

X

List of numeric matrices or data.frames. Explanatory variables. Qualitative variables must be transformed into binary variables.

Y

Numeric matrix or data.frame. Response variables. Must contain two columns: "time" and "event". For the event column, accepted values are 0/1 or FALSE/TRUE for censored and event observations.

max.ncomp

Numeric. Maximum number of PLS components to compute during cross-validation (default: 8).

vector

Numeric vector. A vector indicating the number of variables to select for each block and component (default: NULL).

MIN_NVAR

Numeric. Minimum number of variables to select in the model (default: 10).

MAX_NVAR

Numeric. Maximum number of variables to select in the model (default: NULL).

n.cut_points

Numeric. Number of cut points to evaluate the number of variables (default: 5).

MIN_AUC_INCREASE

Numeric. Minimum improvement in AUC required between models to continue evaluation (default: 0.01).

EVAL_METHOD

Character. Method for evaluating performance. Must be one of "AUC", "C-Index", etc. (default: "AUC").

n_run

Numeric. Number of runs for cross-validation (default: 3).

k_folds

Numeric. Number of folds for cross-validation (default: 10).

x.center

Logical. If TRUE, the X matrix is centered to zero means (default: TRUE).

x.scale

Logical. If TRUE, the X matrix is scaled to unit variances (default: FALSE).

remove_near_zero_variance

Logical. If TRUE, near-zero variance variables are removed (default: TRUE).

remove_zero_variance

Logical. If TRUE, zero-variance variables are removed (default: TRUE).

toKeep.zv

Character vector. Names of variables in X to retain despite variance filtering (default: NULL).

remove_variance_at_fold_level

Logical. If TRUE, variance filtering is applied at the fold level (default: FALSE).

remove_non_significant_models

Logical. If TRUE, models with non-significant components are removed before evaluation (default: FALSE).

remove_non_significant

Logical. If TRUE, non-significant components in the final Cox model are removed (default: FALSE).

alpha

Numeric. Significance threshold for selecting variables/components (default: 0.05).

w_AIC

Numeric. Weight for AIC in the evaluation. All weights must sum to 1 (default: 0).

w_C.Index

Numeric. Weight for C-Index in the evaluation. All weights must sum to 1 (default: 0).

w_AUC

Numeric. Weight for AUC in the evaluation. All weights must sum to 1 (default: 1).

w_I.BRIER

Numeric. Weight for Integrative Brier Score in the evaluation. All weights must sum to 1 (default: 0).

times

Numeric vector. Time points for AUC evaluation (default: NULL).

max_time_points

Numeric. Maximum number of time points for AUC evaluation (default: 15).

MIN_AUC

Numeric. Minimum AUC to achieve during cross-validation (default: 0.8).

MIN_COMP_TO_CHECK

Numeric. Number of components to evaluate before stopping if no improvement is observed (default: 3).

pred.attr

Character. Method for evaluating performance. Must be one of "mean" or "median" (default: "mean").

pred.method

Character. AUC evaluation method. Must be one of: "risksetROC", "survivalROC", "cenROC", etc. (default: "cenROC").

fast_mode

Logical. If TRUE, only one fold is evaluated per run; otherwise, all folds are evaluated simultaneously (default: FALSE).

max.iter

Numeric. Maximum number of iterations for convergence (default: 200).

MIN_EPV

Numeric. Minimum number of Events Per Variable for the final Cox model (default: 5).

return_models

Logical. If TRUE, returns all models computed during cross-validation (default: FALSE).

returnData

Logical. If TRUE, returns original and normalized X and Y matrices (default: FALSE).

PARALLEL

Logical. If TRUE, runs cross-validation in parallel using multiple cores (default: FALSE).

verbose

Logical. If TRUE, extra messages are displayed during execution (default: FALSE).

seed

Numeric. Seed for reproducibility (default: 123).

Author

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

Details

The cv.isb.splsdacox_dynamic function performs cross-validation for the single-block sparse partial least squares deviance residual Cox analysis (sPLS-DACOX). Cross-validation evaluates different hyperparameter combinations, including the number of components (max.ncomp) and the number of variables selected (vector). The function systematically evaluates models across multiple runs and folds to determine the best configuration. It allows flexibility in metrics, preprocessing steps (centering, scaling, variance filtering), and stopping criteria.

For each run, the dataset is divided into training and test sets for the specified number of folds (k_folds). Various metrics, such as AIC, C-Index, I. Brier Score, and AUC, are computed to assess model performance. The function identifies the optimal hyperparameters that yield the best performance based on the selected evaluation metrics.

Additionally, it offers options to control the evaluation algorithm method (pred.method), whether to return all models, and parallel processing (PARALLEL). The function also allows the user to control the verbosity of output messages and set the minimum threshold for Events Per Variable (MIN_EPV).

Examples

Run this code
data("X_multiomic")
data("Y_multiomic")
set.seed(123)
index_train <- caret::createDataPartition(Y_multiomic$event, p = .25, list = FALSE, times = 1)
X_train <- X_multiomic
X_train$mirna <- X_train$mirna[index_train,1:20]
X_train$proteomic <- X_train$proteomic[index_train,1:20]
Y_train <- Y_multiomic[index_train,]
vector <- list()
vector$mirna <- c(10)
vector$proteomic <- c(10)
cv.isb.splsdacox_model <- cv.isb.splsdacox(X_train, Y_train, max.ncomp = 1, vector = vector,
n_run = 1, k_folds = 3, x.center = TRUE, x.scale = TRUE)

Run the code above in your browser using DataLab