rec.ev.sim: Generate a cohort with recurrent events

Description

Simulation of cohorts in a context of recurrent event survival analysis including several covariates, individual heterogeneity and periods at risk before and after the initial time of follow-up.

Recurrent event data is a type of multiple event where the subject can experience repeated occurences of the same type (Kelly, 2000), for example repeated asthma attacks or sick leave episodes. In practice, the hazard of an recurrent event can vary depending on the number of previous occurrences, in terms of shape and intensity (Reis, 2011; Navarro, 2012). However, simulations based on a mixture of distributions with different baseline hazard rates are quite rare (Bender, 2005; Metcalfe, 2006).

In a recurrent data context, each subject can present different number of episodes. We talk of episodes (or occurrences) rather than events since each occurrence is a new episode of the same event. This package supposes that there exists one different and independent $Y_k$ distribution depending on $k$, the number of episode at risk. The simulating process for each $Y_k$ is the same than for the multiple events situation (see mult.ev.sim), but in this case, obviously, a subject cannot be at risk for the $k$-th episode if he/she hadn't had the $k-1$-th.

Usage

rec.ev.sim(n, foltime, dist.ev, anc.ev, beta0.ev, dist.cens=rep("weibull",
length(beta0.cens)), anc.cens, beta0.cens, z=NA, beta=NA, x=NA, lambda=NA, 
max.ep=Inf, priskb=0, max.old=0)

Arguments

integer value indicating the desired size of the cohort to be simulated.

foltime

real number that indicates the maximum time of follow-up of the simulated cohort.

dist.ev

vector of arbitrary size indicating the time to event distributions, with possible values weibull for the Weibull distribution, lnorm for the log-normal distribution and llogistic for the log-logistic distribution. I

anc.ev

vector of arbitrary size of real components containing the ancillary parameters for the time to event distributions.

beta0.ev

vector of arbitrary size of real components containing the $\beta_0$ parameters for the time to event distributions.

dist.cens

vector of arbitrary size indicating the time to censoring distributions, with possible values weibull for the Weibull distribution, lnorm for the log-normal distribution and llogistic for the log-logistic distributio

anc.cens

vector of arbitrary size of real components containing the ancillary parameters for the time to censoring distributions.

beta0.cens

vector of arbitrary size of real components containing the $\beta_0$ parameters for the time to censoring distributions.

vector with three elements that contains information relative to a random effect used in order to introduce individual heterogeneity. The first element indicates the distribution: "unif" states for a uniform distribution, "gamma"

beta

list of vectors indicating the effect of the corresponding covariate. The number of vectors in beta must match the number of covariates, and the length of each vector must match the number of events considered. Its default value is NA, indica

list of vectors indicating the distribution and parameters of any covariate that the user need to introduce in the simulated cohort. The possible distributions are "normal" for a normal distribution, "unif" for a uniform distribu

lambda

real number indicating the mean duration of each event or discontinous risk time, assumed to follow a zero-truncated Poisson distribution. Its default value is NA, corresponding to the case where the duration of each event or discontinous ris

max.ep

integer value that matches the maximum permitted number of episodes per subject. Its default value is Inf, i.e. the number of episodes per subject is no limited.

priskb

proportion of subjects at risk prior to the start of follow-up, defaults to 0.

max.old

maximum time at risk prior to the start of follow-up.

Value

An object of class rec.ev.data.sim. It is a data frame containing the episodes suffered by the corresponding subjects. The columns of the data frame are detailed below
nidan integer number that identifies the subject.
real.episodenumber of the episode corresponding to the real history of the individual.
obs.episodenumber of the episode corresponding to the follow-up time of the individual.
timetime until the corresponding event happens (or time to subject drop-out), regarding the beginning of the follow-up time.
statuslogical value indicating if the episode corresponds to an event or a drop-out.
starttime at which an episode starts, taking the beginning of follow-up as the origin of the time scale.
stoptime at which an episode ends, taking the beginning of follow-up as the origin of the time scale.
time2time until the corresponding event happens (or time to subject drop-out), in calendar time.
start2time at which an episode starts, where the time scale is calendar time.
stop2time at which an episode ends, where the time scale is calendar time.
oldreal value indicating the time that the individual was at risk before the beginning of follow-up.
risk.beffactor that indicates if an individual was at risk before the beginning of follow-up or not.
longtime not at risk immediately after an episode.
zIndividual heterogeneity generated according to the specified distribution.
xvalue of each covariate randomly generated for each subject in the cohort.

encoding

utf8

Details

In order to get the function to work properly, the length of the vectors containing the parameters of the time to event and time to censure distributions and the number of distributions indicated in the parameter dist must be the same. Finally, priskb and max.old must be positive numbers, with priskb being between 0 and 1. Notice that large values of max.old can result in the routine taking a long time to simulate a cohort with the specified size.

References

Kelly PJ, Lim LL. Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med 2000 Jan 15;19(1):13-33.

Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005 Jun 15;24(11):1713-1723.

Metcalfe C, Thompson SG. The importance of varying the event generation process in simulation studies of statistical methods for recurrent events. Stat Med 2006 Jan 15;25(1):165-179.

Reis RJ, Utzet M, La Rocca PF, Nedel FB, Martin M, Navarro A. Previous sick leaves as predictor of subsequent ones. Int Arch Occup Environ Health 2011 Jun;84(5):491-499.

Navarro A, Moriña D, Reis R, Nedel FB, Martin M, Alvarado S. Hazard functions to describe patterns of new and recurrent sick leave episodes for different diagnoses. Scand J Work Environ Health 2012 Jan 27.

Moriña D, Navarro A. The R package survsim for the simulation of simple and complex survival data. Journal of Statistical Software 2014 Jul; 59(2):1-21.

Examples

Run this code

### A cohort with 500 subjects, with a maximum follow-up time of 1825 days and 
### just a covariate, following a Bernoulli distribution, and a corresponding 
### beta of -0.4, -0.5, -0.6 and -0.7 for each episode.

sim.data <- rec.ev.sim(n=500, foltime=1825, dist.ev=c('lnorm','llogistic', 
'weibull','weibull'),anc.ev=c(1.498, 0.924, 0.923, 1.051),beta0.ev=c(7.195, 
6.583, 6.678, 6.430),,anc.cens=c(1.272, 1.218, 1.341, 1.484),
beta0.cens=c(7.315, 6.975, 6.712, 6.399), z=c("unif", 0.8,1.2), 
beta=list(c(-0.4,-0.5,-0.6,-0.7)), x=list(c("bern", 0.5)),
lambda=c(2.18,2.33,2.40,3.46), priskb=0.5, max.old=730)

summary(sim.data)

Run the code above in your browser using DataLab