pencox_baseline: Estimation of a penalized Cox model with baseline covariates onlu

Description

This function estimates a penalized Cox model where only baseline covariates are included as predictors, and then computes a bootstrap optimism correction procedure that is used to validate the predictive performance of the model

Usage

pencox_baseline(data, formula, penalty = "ridge", standardize = TRUE,
  penalty.factor = 1, n.alpha.elnet = 11, n.folds.elnet = 5,
  n.boots = 0, n.cores = 1, verbose = TRUE)

Arguments

data

a data frame with one row for each subject.It should at least contain a subject id (called id), the time to event outcome (time), and the binary censoring indicator (event), plus at least one covariate to be included in the linear predictor

formula

a formula specifying the variables in data to include as predictors in the penalized Cox model

penalty

the type of penalty function used for regularization. Default is 'ridge', other possible values are 'elasticnet' and 'lasso'

standardize

logical argument: should the predicted random effects be standardized when included in the penalized Cox model? Default is TRUE

penalty.factor

a single value, or a vector of values, indicating whether the covariates (if any) should be penalized (1) or not (0). Default is penalty.factor = 1

n.alpha.elnet

number of alpha values for the two-dimensional grid of tuning parameteres in elasticnet. Only relevant if penalty = 'elasticnet'. Default is 11, so that the resulting alpha grid is c(1, 0.9, 0.8, ..., 0.1, 0)

n.folds.elnet

number of folds to be used for the selection of the tuning parameter in elasticnet. Only relevant if penalty = 'elasticnet'. Default is 5

n.boots

number of bootstrap samples to be used in the bootstrap optimism correction procedure. If 0, no bootstrapping is performed

n.cores

number of cores to use to parallelize the computation of the bootstrap optimism correction procedure. If ncores = 1 (default), no parallelization is done. Pro tip: you can use parallel::detectCores() to check how many cores are available on your computer

verbose

if TRUE (default and recommended value), information on the ongoing computations is printed in the console

Value

A list containing the following objects:

call: the function call
pcox.orig: the penalized Cox model fitted on the original dataset;
surv.data: a data frame with the survival data
X.orig: a data frame with the design matrix used to estimate the Cox model
n.boots: number of bootstrap samples;
boot.ids: a list with the ids of bootstrapped subjects (when n.boots > 0);
pcox.boot: a list where each element is a fitted penalized Cox model for a given bootstrap sample (when n.boots > 0).

References

Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178

Examples

Run this code

# NOT RUN {
# generate example data
set.seed(1234)
p = 4 # number of longitudinal predictors
simdata = simulate_prclmm_data(n = 100, p = p, p.relev = 2, 
             seed = 123, t.values = c(0, 0.2, 0.5, 1, 1.5, 2))
#create dataframe with baseline measurements only
baseline.visits = simdata$long.data[which(!duplicated(simdata$long.data$id)),]
df = cbind(simdata$surv.data, baseline.visits)
df = df[ , -c(5:7)]

do.bootstrap = FALSE
# IMPORTANT: set do.bootstrap = TRUE to compute the optimism correction!
n.boots = ifelse(do.bootstrap, 100, 0)
parallelize = FALSE
# IMPORTANT: set parallelize = TRUE to speed computations up!
if (!parallelize) n.cores = 1
if (parallelize) {
  # identify number of available cores on your machine
  n.cores = parallel::detectCores()
  if (is.na(n.cores)) n.cores = 1
}

form = as.formula(~ baseline.age + marker1 + marker2
                     + marker3 + marker4)
base.pcox = pencox_baseline(data = df, 
              formula = form, 
              n.boots = n.boots, n.cores = n.cores) 
ls(base.pcox)
# }