fit_mlpmms: Step 1 of PRC-MLPMM (estimation of the linear mixed models)

Description

This function performs the first step for the estimation of the PRC-MLPMM model proposed in Signorelli et al. (2021)

Usage

fit_mlpmms(y.names, fixefs, ranef.time, randint.items = TRUE, long.data,
  surv.data, t.from.base, n.boots = 0, n.cores = 1, verbose = TRUE,
  seed = 123, maxiter = 100, conv = rep(0.001, 3),
  lcmm.warnings = FALSE)

Value

A list containing the following objects:

call.info: a list containing the following function call information: call, y.names, fixefs, ranef.time, randint.items;
mlpmm.fits.orig: a list with the MLPMMs fitted on the original dataset (it should comprise as many MLPMMs as the elements of y.names are);
df.sanitized: a sanitized version of the supplied long.data dataframe, without the longitudinal measurements that are taken after the event or after censoring;
n.boots: number of bootstrap samples;
boot.ids: a list with the ids of bootstrapped subjects (when n.boots > 0);
mlpmm.fits.boot: a list of lists, which contains the MLPMMs fitted on each bootstrapped datasets (when n.boots > 0).

Arguments

y.names: a list with the names of the response variables which the MLPMMs have to be fitted to. Each element in the list contains all the items used to reconstruct a latent biological process of interest
fixefs: a fixed effects formula for the model, where the time variable (specified also in ranef.time) is included as first element and within the function contrast(). Examples: ~ contrast(age), ~ contrast(age) + group + treatment
ranef.time: a character with the name of the time variable for which to include a shared random slope
randint.items: logical: should item-specific random intercepts be included in the MLCMMs? Default is TRUE. It can also be a vector, with different values for different elements of y.names
long.data: a data frame with the longitudinal predictors, comprehensive of a variable called id with the subject ids
surv.data: a data frame with the survival data and (if relevant) additional baseline covariates. surv.data should at least contain a subject id (called id), the time to event outcome (time), and binary event variable (event)
t.from.base: name of the variable containing time from baseline in long.data
n.boots: number of bootstrap samples to be used in the cluster bootstrap optimism correction procedure (CBOCP). If 0, no bootstrapping is performed
n.cores: number of cores to use to parallelize part of the computations. If ncores = 1 (default), no parallelization is done. Pro tip: you can use parallel::detectCores() to check how many cores are available on your computer
verbose: if TRUE (default and recommended value), information on the ongoing computations is printed in the console
seed: random seed used for the bootstrap sampling. Default is seed = 123
maxiter: maximum number of iterations to use when calling the function multlcmm. Default is 100
conv: a vector containing the three convergence criteria (convB, convL and convG) to use when calling the function multlcmm. Default is c(1e-3, 1e-3, 1e-3)
lcmm.warnings: logical. If TRUE, a warning is printed every time the (strict) convergence criteria of the multlcmm function are not met. Default is FALSE

Author

Mirko Signorelli

Details

This function is essentially a wrapper of the multlcmm function that has the goal of simplifying the estimation of several MLPMMs. In general, ensuring convergence of the algorithm implemented in multlcmm is sometimes difficult, and it is hard to write a function that can automatically solve these convergence problems. fit_mplmms returns a warning when estimation did not converge for one or more MLPMMs. If this happens, try to change the convergence criteria in conv or the relevant randint.items value. If doing this doesn't solve the problem, it is recommended to re-estimate the specific MLPMMs for which estimation didn't converge directly with multlcmm, trying to manually solve the convergence issues

References

Signorelli, M. (2023). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. arXiv preprint: arXiv:2309.15600

Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178

Examples

Run this code

# \donttest{
# generate example data
set.seed(123)
n.items = c(4,2,2,3,4,2)
simdata = simulate_prcmlpmm_data(n = 100, p = length(n.items),  
             p.relev = 3, n.items = n.items, 
             type = 'u+b', seed = 1)
 
# specify options for cluster bootstrap optimism correction
# procedure and for parallel computing 
do.bootstrap = FALSE
# IMPORTANT: set do.bootstrap = TRUE to compute the optimism correction!
n.boots = ifelse(do.bootstrap, 100, 0)
more.cores = FALSE
# IMPORTANT: set more.cores = TRUE to speed computations up!
if (!more.cores) n.cores = 2
if (more.cores) {
   # identify number of available cores on your machine
   n.cores = parallel::detectCores()
   if (is.na(n.cores)) n.cores = 2
}

# step 1 of PRC-MLPMM: estimate the MLPMMs
y.names = vector('list', length(n.items))
for (i in 1:length(n.items)) {
  y.names[[i]] = paste('marker', i, '_', 1:n.items[i], sep = '')
}

step1 = fit_mlpmms(y.names, fixefs = ~ contrast(age),  
                 ranef.time = age, randint.items = TRUE, 
                 long.data = simdata$long.data, 
                 surv.data = simdata$surv.data,
                 t.from.base = t.from.base,
                 n.boots = n.boots, n.cores = n.cores)
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025