survsim-package: Simulation of simple and complex survival data

Description

Simulation of cohorts in a context of simple and complex survival analysis, multiple events and recurrent events including several covariates, individual heterogeneity and periods at risk before and after the initial time of follow-up.

Distribution	Survival function	Density function	Parametrization
Weibull	$exp(- \lambda t^p)$	$\lambda pt^{p-1}exp(- \lambda t^p)$	$\lambda = exp(-p \beta_0)$
Log-normal	$1- \Theta((log(t)- \mu)/ \sigma)$	$(1/(t \sigma \sqrt{2 \pi})) exp((-1/(2 \sigma^2))(log(t) - \mu)^2)$	$\mu = \beta_0$
Log-logistic	$1/(1+(\lambda t)^{1/ \gamma}$)	$\lambda^{1/ \gamma}t^{(1/ \gamma) - 1}/ (\gamma (1 + (\lambda t)^{1/ \gamma})^2)$	$\lambda = exp(- \beta_0)$

Distribution	Time
Weibull	$t = (- ln u/ \lambda)^{1/p}$
Log-normal	$t = exp(\beta_0 + \gamma (log(u)-log(1-u)))$
Log-logistic	$t = exp(\beta_0 + \sigma \Theta^{-1}(u))$

Where $\Theta$ is the standard normal cumulative distribution.

In order to simulate censored survival data, two survival distributions are required, one for the uncensored survival times that would be observed if the follow-up had been sufficiently long to reach the event and another representing the censoring mechanism. The uncensored survival distribution, $T'_i$, for $i=1,\ldots,n$ subjects, could be generated to depend on a set of covariates with a specified relationship with survival, which represents the true prognostic importance of each covariate (Burton, 2006). The package allows to simulate times by means of using Weibull (and exponential as a particular case), log-normal and log-logistic distributions, as such is showed in previous table. To induce individual heterogeneity or within-subject correlation we generate $Z_i$, a random effect covariate that follows a particular distribution (Uniform or Normal).

$$t_i = t_i'\cdot z_i$$

When $z_i = 1$, for all subjects, we are in the case of individual homogeneity and the survival times are completely specified by the covariates. Random non-informative right censoring, $C_i$, can be generated in a similar manner to the uncensored survival times, $T'_i$, by assuming a particular distribution for the censoring times (previous table), but without including any covariates nor individual heterogenity. The observation times, $Y_i'$, incorporating both events and censored observations are calculated for each case by combining the uncensored survival times, $T_i$, and the censoring times, $C_i$. If the uncensored survival time for an observation is less than or equal to the censored time, then the event is considered to be observed and the observation time equals the uncensored survival time, otherwise the event is considered censored and the observation time equals the censored time. In other words, once simulated $t_i$ and $c_i$, we can define $Y_i'= min(t_i,c_i)$ as the obervation time with $\delta_i$ an indicator of non-censoring, i.e. $\delta_i = I(t_i \le c_i )$. While all $y_i'$ start at 0, the package allows create dynamic cohorts. We can generate entry times higher than 0 adding a $t_0$ value corresponding with an uniform distribution in $[0,t_{max follow-up}]$. We can also simulate subjects at risk before of the initial time of follow-up $(y_i'= 0)$, by including an uniform distribution for $t_0$ between $[-t_{max old},0)$ for a fixed percentage of subjects. Then:

$$y_i=y_i' + t_0$$

where $t_0$ follows a uniform distribution in $[0,t_{max follow-up}]$ if entry time is 0 or more and $t_0$ is uniform distributed in $[-t_{max old}, 0)$ if entry time is less than 0. Therefore, $t_0$ represents the initial point of the episode, $y_i$ the endpoint and $y_i'$ is the lenght. Note that $y_i'+t_0$ can be higher than $t_{max follow-up}$, and in this case $y_i$ will be set at $t_{max follow-up}$ and $\delta_i=0$. The observations corresponding to the subjects at risk before of the initial time of follow-up have $t_0$ negative, then the initial point of the episode will be set at 0. $y_i$ may also be negative, in this case this episode will not be included in the simulated data, as long as this episode won't be observed in practice.

Arguments

Details

Package:	survsim
Type:	Package
Version:	1.1.8
Date:	2021-12-13
License:	GPL version 2 or newer
LazyLoad:	yes

The package provide a tool for simulation of cohorts in a simple single-event context through the function simple.surv.sim, in a recurrent event context with the function rec.ev.sim, in a multiple event context with the function mult.ev.sim and in a competing risks context with the function crisk.sim, and it also allows the user to generate aggregated data from the simulated cohort, by means of the function accum.

References

Kelly PJ, Lim LL. Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med 2000 Jan 15;19(1):13-33.

Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005 Jun 15;24(11):1713-1723.

Metcalfe C, Thompson SG. The importance of varying the event generation process in simulation studies of statistical methods for recurrent events. Stat Med 2006 Jan 15;25(1):165-179.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med 2006 Dec 30;25(24):4279-4292.

Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating competing risks data in survival analysis. Stat Med 2009 Jan 5;28(1):956-971.

Reis RJ, Utzet M, La Rocca PF, Nedel FB, Martin M, Navarro A. Previous sick leaves as predictor of subsequent ones. Int Arch Occup Environ Health 2011 Jun;84(5):491-499.

Navarro A, Mori<U+00F1>a D, Reis R, Nedel FB, Martin M, Alvarado S. Hazard functions to describe patterns of new and recurrent sick leave episodes for different diagnoses. Scand J Work Environ Health 2012 Jan 27.

Mori<U+00F1>a D, Navarro A. The R package survsim for the simulation of simple and complex survival data. Journal of Statistical Software 2014 Jul; 59(2):1-20.

Distribution	Survival function	Density function	Parametrization
Weibull	\(exp(- \lambda t^p)\)	\(\lambda pt^{p-1}exp(- \lambda t^p)\)	\(\lambda = exp(-p \beta_0)\)
Log-normal	\(1- \Theta((log(t)- \mu)/ \sigma)\)	\((1/(t \sigma \sqrt{2 \pi})) exp((-1/(2 \sigma^2))(log(t) - \mu)^2)\)	\(\mu = \beta_0\)
Log-logistic	\(1/(1+(\lambda t)^{1/ \gamma}\))	\(\lambda^{1/ \gamma}t^{(1/ \gamma) - 1}/ (\gamma (1 + (\lambda t)^{1/ \gamma})^2)\)	\(\lambda = exp(- \beta_0)\)