BayesID_AFT: The function to implement Bayesian parametric and semi-parametric analyses for semi-competing risks data in the context of accelerated failure time (AFT) models.

Description

Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.

Usage

BayesID_AFT(Formula, data, model = "LN", hyperParams, startValues,
mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)

Arguments

Formula

a Formula object, with the outcomes on the left of a $\sim$, and covariates on the right. It is of the form, left truncation time | interval- (or right-) censored time for non-terminal event | interval-(or right-) censored time for terminal event $\sim$ covariates for $h_1$ | covariates for $h_2$ | covariates for $h_3$: i.e., $L$ | $y_{1L}$+$y_{1U}$ | $y_{2L}$+$y_{2U}$ ~ $x_1$ | $x_2$ | $x_3$.

data

a data.frame in which to interpret the variables named in Formula.

model

The specification of baseline survival distribution: "LN" or "DPM".

hyperParams

a list containing lists or vectors for hyperparameter values in hierarchical models. Components include, theta (a numeric vector for hyperparameter in the prior of subject-specific frailty variance component), LN (a list containing numeric vectors for log-Normal hyperparameters: LN.ab1, LN.ab2, LN.ab3), DPM (a list containing numeric vectors for DPM hyperparameters: DPM.mu1, DPM.mu2, DPM.mu3, DPM.sigSq1, DPM.sigSq2, DPM.sigSq3, DPM.ab1, DPM.ab2, DPM.ab3, Tau.ab1, Tau.ab2, Tau.ab3). See Details and Examples below.

startValues

a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function initiate.startValues_AFT.

mcmcParams

a list containing variables required for MCMC sampling. Components include, run (a list containing numeric values for setting for the overall run: numReps, total number of scans; thin, extent of thinning; burninPerc, the proportion of burn-in). storage (a list containing numeric values for storing posterior samples for subject- and cluster-specific random effects: nGam_save, the number of $\gamma$ to be stored; nY1_save, the number of $y1$ to be stored; nY2_save, the number of $y2$ to be stored; nY1.NA_save, the number of $y1.NA$ to be stored). tuning (a list containing numeric values relevant to tuning parameters for specific updates in Metropolis-Hastings (MH) algorithm: betag.prop.var, the variance of proposal density for $\beta_g$; mug.prop.var, the variance of proposal density for $\mu_{g}$; zetag.prop.var, the variance of proposal density for $1/\sigma_g^2$; gamma.prop.var, the variance of proposal density for $\gamma$). See Details and Examples below.

na.action

how NAs are treated. See model.frame.

subset

a specification of the rows to be used: defaults to all rows. See model.frame.

path

the name of directory where the results are saved.

Value

BayesID_AFT returns an object of class Bayes_AFT.

Details

We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let $T_{i1}$, $T_{i2}$ denote time to non-terminal and terminal event from subject $i=1,...,n$. We propose to directly model the times of the events via the following AFT model specification:

$$\log(T_{i1}) = x_{i1}^\top\beta_1 + \gamma_i + \epsilon_{i1}, T_{i1} > 0,$$ $$\log(T_{i2}) = x_{i2}^\top\beta_2 + \gamma_i + \epsilon_{i2}, T_{i2} > 0,$$ $$\log(T_{i2} - T_{i1}) = x_{i3}^\top\beta_3 + \gamma_i + \epsilon_{i3}, T_{i2} > T_{i1},$$ where $x_{ig}$ is a vector of transition-specific covariates, $\beta_g$ is a corresponding vector of transition-specific regression parameters and $\epsilon_{ig}$ is a transition-specific random variable whose distribution determines that of the corresponding transition time, $g \in \{1,2,3\}$. $\gamma_i$ is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let $L_{i}$ denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant $i$ was observed at follow-up times $\{c_{i1},\ldots, c_{im_i}\}$ and let $c_i^*$ denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the $i^{th}$ study participant satisfy $c_{ij}\leq T_{i1}< c_{ij+1}$ for some $j$ and $c_{ik}\leq T_{i2}< c_{ik+1}$ for some $k$, respectively. Then the observed outcomes for the $i^{th}$ study participant can be succinctly denoted by $\{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}$.

For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each $\epsilon_{ig}$. More precisely, $\epsilon_{ig}$ is taken to be an independent draw from a mixture of $M_g$ normal distributions with means and variances ($\mu_{gr}$, $\sigma_{gr}^2$), for $r \in \{1,\ldots,M_g\}$. Since the class-specific $(\mu_{gr}, \sigma_{gr}^2)$ are not known, they are taken to be draws from some common distribution, $G_{g0}$, often referred to as the centering distribution. Furthermore, since the `true' class membership for any given study participant is not known, we let $p_{gr}$ denote the probability of belonging to the $r^{th}$ class for transition $g$ and $p_g$ = $(p_{g1}, \ldots, p_{gM_g})$ the collection of such probabilities. Note, $p_g$ is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the $n$ individuals across the $M_g$ classes, $p_g$ is assumed to follow a conjugate symmetric Dirichlet$(\tau_g/M_g,\ldots,\tau_g/M_g)$ distribution, where $\tau_g$ is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as: $$\epsilon_{ig} | r_{i} \sim Normal(\mu_{r_{i}}, \sigma_{r_{i}}^2),$$ $$(\mu_{gr}, \sigma_{gr}^2) \sim G_{g0}, ~~for~ r=1,\ldots,M_g,$$ $$r_{i}| p_g \sim Discrete(r_{i} | p_{g1},\ldots,p_{gM_g}),$$ $$p_g \sim Dirichlet(\tau_g/M_g, \ldots, \tau_g/M_g).$$ Letting $M_g$ approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma($a_{\tau_g}$, $b_{\tau_g}$) hyperprior for $\tau_g$. For regression parameters, we adopt non-informative flat priors on the real line. For $\gamma$=$\{\gamma_1, \ldots, \gamma_n\}$, we assume that each $\gamma_i$ is an independent random draw from a Normal(0, $\theta$) distribution. In the absence of prior knowledge on the variance component $\theta$, we adopt a conjugate inverse-Gamma hyperprior, IG($a_\theta$, $b_\theta$). Finally, We take the $G_{g0}$ as a normal distribution centered at $\mu_{g0}$ with a variance $\sigma_{g0}^2$ for $\mu_{gr}$ and an IG($a_{\sigma_g}$, $b_{\sigma_g}$) for $\sigma_{gr}^2$.

For the Bayesian parametric analysis, we build on the log-Normal formulation and take the $\epsilon_{ig}$ to follow independent Normal($\mu_g$, $\sigma_g^2$) distributions, $g$=1,2,3. For location parameters $\{\mu_1, \mu_2, \mu_3\}$, we adopt non-informative flat priors on the real line. For $\{\sigma_1^2, \sigma_2^2, \sigma_3^2\}$, we adopt independent inverse Gamma distributions, denoted IG($a_{\sigma g}$, $b_{\sigma g}$). For $\beta_g$, $\gamma$, and $\theta$, we adopt the same priors as those adopted for the DPM model.

References

Lee, K. H., Rondeau, V., and Haneuse, S. (2017), Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412. Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019), SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
# loading a data set
data(scrData)
scrData$y1L <- scrData$y1U <- scrData[,1]
scrData$y1U[which(scrData[,2] == 0)] <- Inf
scrData$y2L <- scrData$y2U <- scrData[,3]
scrData$y2U[which(scrData[,4] == 0)] <- Inf
scrData$LT <- rep(0, dim(scrData)[1])

form <- Formula(LT | y1L + y1U | y2L + y2U  ~ x1 + x2 + x3 | x1 + x2 | x1 + x2)

#####################
## Hyperparameters ##
#####################

## Subject-specific random effects variance component
##
theta.ab <- c(0.5, 0.05)

## log-Normal model
##
LN.ab1 <- c(0.3, 0.3)
LN.ab2 <- c(0.3, 0.3)
LN.ab3 <- c(0.3, 0.3)

## DPM model
##
DPM.mu1 <- log(12)
DPM.mu2 <- log(12)
DPM.mu3 <- log(12)

DPM.sigSq1 <- 100
DPM.sigSq2 <- 100
DPM.sigSq3 <- 100

DPM.ab1 <-  c(2, 1)
DPM.ab2 <-  c(2, 1)
DPM.ab3 <-  c(2, 1)

Tau.ab1 <- c(1.5, 0.0125)
Tau.ab2 <- c(1.5, 0.0125)
Tau.ab3 <- c(1.5, 0.0125)

##
hyperParams <- list(theta=theta.ab,
LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3),
DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1,
DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2,
DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3))

###################
## MCMC SETTINGS ##
###################

## Setting for the overall run
##
numReps    <- 300
thin       <- 3
burninPerc <- 0.5

## Setting for storage
##
nGam_save <- 10
nY1_save <- 10
nY2_save <- 10
nY1.NA_save <- 10

## Tuning parameters for specific updates
##
##  - those common to all models
betag.prop.var	<- c(0.01,0.01,0.01)
mug.prop.var	<- c(0.1,0.1,0.1)
zetag.prop.var	<- c(0.1,0.1,0.1)
gamma.prop.var	<- 0.01

##
mcmcParams	<- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save),
tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var,
zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var))

#################################################################
## Analysis of Independent Semi-competing risks data ############
#################################################################

###############
## logNormal ##
###############

##
myModel <- "LN"
myPath  <- "Output/01-Results-LN/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_LN
summ.fit_LN <- summary(fit_LN); names(summ.fit_LN)
summ.fit_LN
pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_LN, plot.est="Haz")
plot(pred_LN, plot.est="Surv")

#########
## DPM ##
#########

##
myModel <- "DPM"
myPath  <- "Output/02-Results-DPM/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_DPM
summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM)
summ.fit_DPM
pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_DPM, plot.est="Haz")
plot(pred_DPM, plot.est="Surv")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab