simulate_prcmlpmm_data: Simulate data that can be used to fit the PRC-LMM model

Description

This function allows to simulate a survival outcome from longitudinal predictors. Specifically, the longitudinal predictors are simulated from multivariate latent process mixed models (MLPMMs), and the survival outcome from a Weibull model where the time to event depends on the random effects from the MLPMMs. It is an implementation of the simulation method used in Signorelli et al. (2021)

Usage

simulate_prcmlpmm_data(n = 100, p = 5, p.relev = 2, n.items = c(3, 2,
  3, 4, 1), type = "u", lambda = 0.2, nu = 2, seed = 1,
  base.age.range = c(3, 5), cens.range = c(0.5, 10), t.values = c(0, 0.5,
  1, 2))

Arguments

sample size

number of longitudinal latent processes

p.relev

number of latent processes that are associated with the survival outcome (min: 1, max: p)

n.items

number of items that are observed for each latent process of interest. It must be either a scalar, or a vector of length p

type

the type of relation between the longitudinal outcomes and survival time. Two values can be used: 'u' refers to the PRC-MLPMM(U) model, and 'u+b' to the PRC-MLPMM(U+B) model presented in Section 2.3 of Signorelli et al. (2021). See the article for the mathematical details

lambda

Weibull location parameter, positive

Weibull scale parameter, positive

seed

random seed (defaults to 1)

base.age.range

range for age at baseline (set it equal to c(0, 0) if you want all subjects to enter the study at the same age)

cens.range

range for censoring times

t.values

vector specifying the time points at which longitudinal measurements are collected (NB: for simplicity, this function assumes a balanced designed; however, pencal is designed to work both with balanced and with unbalanced designs!)

Value

A list containing the following elements:

a dataframe long.data with data on the longitudinal predictors, comprehensive of a subject id (id), baseline age (base.age), time from baseline (t.from.base) and the longitudinal biomarkers;
a dataframe surv.data with the survival data: a subject id (id), baseline age (baseline.age), the time to event outcome (time) and a binary vector (event) that is 1 if the event is observed, and 0 in case of right-censoring;
perc.cens the proportion of censored individuals in the simulated dataset.

References

Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178

Examples

Run this code

# NOT RUN {
# generate example data
simdata = simulate_prcmlpmm_data(n = 40, p = 6,  
             p.relev = 3, n.items = c(3,4,2,5,4,2), 
             type = 'u+b', seed = 1)

# names of the longitudinal outcomes:
names(simdata$long.data)
# markerx_y is the y-th item for latent process (LP) x
# we have 6 latent processes of interest, and for LP1 
# we measure 3 items, for LP2 4, for LP3 2 items, and so on

# visualize trajectories of marker1_1
library(ptmixed)
make.spaghetti(x = age, y = marker1_1, 
               id = id, group = id,
               data = simdata$long.data, 
               legend.inset = - 1)

# proportion of censored subjects
simdata$censoring.prop
# visualize KM estimate of survival
library(survival)
surv.obj = Surv(time = simdata$surv.data$time, 
                event = simdata$surv.data$event)
kaplan <- survfit(surv.obj ~ 1,  
                 type="kaplan-meier")
plot(kaplan)
# }

Run the code above in your browser using DataLab