Learn R Programming

Coxmos (version 1.1.3)

getTrainTest: getTrainTest

Description

Splits input data (X and Y) into training and test sets for survival analysis, ensuring balanced event distributions. Supports single or multiple splits (repeats) for cross-validation and multiblock data in X parameter.

Usage

getTrainTest(X, Y, p = 0.8, times = 1, seed = 123)

Value

  • If times = 1: A list with:

  • X_train: Training features.

  • Y_train: Training survival data.

  • X_test: Test features.

  • Y_test: Test survival data.

  • If times > 1: A named list of length times, each element containing the above structure.

Arguments

X

Numeric matrix, data.frame or list of matrices or data.frames. Predictor variables (features). Rows are samples, columns are variables.

Y

Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations.

p

Numeric (0 < p < 1). Proportion of samples to allocate to the training set (default: 0.8).

times

Integer. Number of splits to perform repeated partitioning (default: 1).

seed

Integer. Random seed for reproducibility (default: 123).

Author

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

Details

This function uses caret::createDataPartition() to partition the data while preserving the proportion of events (e.g., deaths) in both training and test sets. It is designed for survival data where Y must contain an event column (binary: 1=event, 0=censored).

See Also

Examples

Run this code
# Single split (80% training, 20% test)
data(X_proteomic, Y_proteomic)
lst <- getTrainTest(X_proteomic, Y_proteomic, p = 0.8)

# Repeated splits (3x)
lst_repeats <- getTrainTest(X_proteomic, Y_proteomic, p = 0.7, times = 3)

Run the code above in your browser using DataLab