
This function performs the first step for the estimation of the PRC-MLPMM model proposed in Signorelli et al. (2021)
fit_mlpmms(y.names, fixefs, ranef.time, randint.items = TRUE, long.data,
surv.data, t.from.base, n.boots = 0, n.cores = 1, verbose = TRUE,
seed = 123, maxiter = 100, conv = rep(0.001, 3),
lcmm.warnings = FALSE)
A list containing the following objects:
call.info
: a list containing the following function
call information: call
, y.names
, fixefs
,
ranef.time
, randint.items
;
mlpmm.fits.orig
: a list with the MLPMMs fitted on the
original dataset (it should comprise as many MLPMMs as the elements
of y.names
are);
df.sanitized
: a sanitized version of the supplied
long.data
dataframe, without the
longitudinal measurements that are taken after the event
or after censoring;
n.boots
: number of bootstrap samples;
boot.ids
: a list with the ids of bootstrapped subjects
(when n.boots > 0
);
mlpmm.fits.boot
: a list of lists, which contains the MLPMMs
fitted on each bootstrapped datasets (when n.boots > 0
).
a list with the names of the response variables which the MLPMMs have to be fitted to. Each element in the list contains all the items used to reconstruct a latent biological process of interest
a fixed effects formula for the model, where the
time variable (specified also in ranef.time
) is
included as first element and within the function
contrast()
. Examples: ~ contrast(age)
,
~ contrast(age) + group + treatment
a character with the name of the time variable for which to include a shared random slope
logical: should item-specific random intercepts
be included in the MLCMMs? Default is TRUE
. It can also be a
vector, with different values for different elements of y.names
a data frame with the longitudinal predictors,
comprehensive of a variable called id
with the subject
ids
a data frame with the survival data and (if
relevant) additional baseline covariates. surv.data
should at least
contain a subject id (called id
), the time to event outcome
(time
), and binary event variable (event
)
name of the variable containing time from
baseline in long.data
number of bootstrap samples to be used in the cluster bootstrap optimism correction procedure (CBOCP). If 0, no bootstrapping is performed
number of cores to use to parallelize part of
the computations. If ncores = 1
(default),
no parallelization is done. Pro tip: you can use
parallel::detectCores()
to check how many
cores are available on your computer
if TRUE
(default and recommended value), information
on the ongoing computations is printed in the console
random seed used for the bootstrap sampling. Default
is seed = 123
maximum number of iterations to use when calling
the function multlcmm
. Default is 100
a vector containing the three convergence criteria
(convB
, convL
and convG
) to use when calling
the function multlcmm
. Default is c(1e-3, 1e-3, 1e-3)
logical. If TRUE, a warning is printed every
time the (strict) convergence criteria of the multlcmm
function
are not met. Default is FALSE
Mirko Signorelli
This function is essentially a wrapper of the
multlcmm
function that has the goal of simplifying
the estimation of several MLPMMs. In general, ensuring
convergence of the algorithm implemented in multlcmm
is sometimes difficult, and it is hard to write a function that
can automatically solve these convergence problems. fit_mplmms
returns a warning when estimation did not converge for one or
more MLPMMs. If this happens, try to change the convergence
criteria in conv
or the relevant randint.items
value.
If doing this doesn't solve the problem, it is recommended to
re-estimate the specific MLPMMs for which estimation didn't converge
directly with multlcmm
, trying to manually solve
the convergence issues
Signorelli, M. (2023). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. arXiv preprint: arXiv:2309.15600
Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178
simulate_prcmlpmm_data
,
summarize_mlpmms
(step 2),
fit_prcmlpmm
(step 3),
performance_prc
# \donttest{
# generate example data
set.seed(123)
n.items = c(4,2,2,3,4,2)
simdata = simulate_prcmlpmm_data(n = 100, p = length(n.items),
p.relev = 3, n.items = n.items,
type = 'u+b', seed = 1)
# specify options for cluster bootstrap optimism correction
# procedure and for parallel computing
do.bootstrap = FALSE
# IMPORTANT: set do.bootstrap = TRUE to compute the optimism correction!
n.boots = ifelse(do.bootstrap, 100, 0)
more.cores = FALSE
# IMPORTANT: set more.cores = TRUE to speed computations up!
if (!more.cores) n.cores = 2
if (more.cores) {
# identify number of available cores on your machine
n.cores = parallel::detectCores()
if (is.na(n.cores)) n.cores = 2
}
# step 1 of PRC-MLPMM: estimate the MLPMMs
y.names = vector('list', length(n.items))
for (i in 1:length(n.items)) {
y.names[[i]] = paste('marker', i, '_', 1:n.items[i], sep = '')
}
step1 = fit_mlpmms(y.names, fixefs = ~ contrast(age),
ranef.time = age, randint.items = TRUE,
long.data = simdata$long.data,
surv.data = simdata$surv.data,
t.from.base = t.from.base,
n.boots = n.boots, n.cores = n.cores)
# }
Run the code above in your browser using DataLab