pencox_baseline: Estimation of a penalized Cox model with baseline covariates onlu

Description

This function estimates a penalized Cox model where only baseline covariates are included as predictors, and then computes a bootstrap optimism correction procedure that is used to validate the predictive performance of the model

Usage

pencox_baseline(data, formula, penalty = "ridge", standardize = TRUE,
  penalty.factor = 1, n.alpha.elnet = 11, n.folds.elnet = 5,
  n.boots = 0, n.cores = 1, verbose = TRUE)

Value

A list containing the following objects:

call: the function call
pcox.orig: the penalized Cox model fitted on the original dataset;
surv.data: a data frame with the survival data
X.orig: a data frame with the design matrix used to estimate the Cox model
n.boots: number of bootstrap samples;
boot.ids: a list with the ids of bootstrapped subjects (when n.boots > 0);
pcox.boot: a list where each element is a fitted penalized Cox model for a given bootstrap sample (when n.boots > 0).

Arguments

data: a data frame with one row for each subject.It should at least contain a subject id (called id), the time to event outcome (time), and the binary censoring indicator (event), plus at least one covariate to be included in the linear predictor
formula: a formula specifying the variables in data to include as predictors in the penalized Cox model
penalty: the type of penalty function used for regularization. Default is 'ridge', other possible values are 'elasticnet' and 'lasso'
standardize: logical argument: should the covariates be standardized when included in the penalized Cox model? Default is TRUE
penalty.factor: a single value, or a vector of values, indicating whether the covariates (if any) should be penalized (1) or not (0). Default is penalty.factor = 1
n.alpha.elnet: number of alpha values for the two-dimensional grid of tuning parameteres in elasticnet. Only relevant if penalty = 'elasticnet'. Default is 11, so that the resulting alpha grid is c(1, 0.9, 0.8, ..., 0.1, 0)
n.folds.elnet: number of folds to be used for the selection of the tuning parameter in elasticnet. Only relevant if penalty = 'elasticnet'. Default is 5
n.boots: number of bootstrap samples to be used in the bootstrap optimism correction procedure. If 0, no bootstrapping is performed
n.cores: number of cores to use to parallelize the computation of the CBOCP. If ncores = 1 (default), no parallelization is done. Pro tip: you can use parallel::detectCores() to check how many cores are available on your computer
verbose: if TRUE (default and recommended value), information on the ongoing computations is printed in the console

Author

Mirko Signorelli

References

Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178

Examples

Run this code

# generate example data
set.seed(1234)
p = 4 # number of longitudinal predictors
simdata = simulate_prclmm_data(n = 100, p = p, p.relev = 2, 
             seed = 123, t.values = c(0, 0.2, 0.5, 1, 1.5, 2))
#create dataframe with baseline measurements only
baseline.visits = simdata$long.data[which(!duplicated(simdata$long.data$id)),]
df = cbind(simdata$surv.data, baseline.visits)
df = df[ , -c(5:7)]

do.bootstrap = FALSE
# IMPORTANT: set do.bootstrap = TRUE to compute the optimism correction!
n.boots = ifelse(do.bootstrap, 100, 0)
more.cores = FALSE
# IMPORTANT: set more.cores = TRUE to speed computations up!
if (!more.cores) n.cores = 2
if (more.cores) {
   # identify number of available cores on your machine
   n.cores = parallel::detectCores()
   if (is.na(n.cores)) n.cores = 2
}

form = as.formula(~ baseline.age + marker1 + marker2
                     + marker3 + marker4)
base.pcox = pencox_baseline(data = df, 
              formula = form, 
              n.boots = n.boots, n.cores = n.cores) 
ls(base.pcox)

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025