bayessurvreg3: Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data with flexibly specified random effects and/or error distribution.

Description

A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.

A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.

The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.

For details, see Komárek (2006) and Komárek and Lesaffre (2008).

SUPPLEMENTED IN 06/2013: Interval-censored times might be subject to misclassification. In case of doubly-interval-censored data, the event time might be subject to misclassification. For details, see GKJ (2013+). We explain first in more detail a model without doubly censoring. Let $T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i$ be event times for $i$th cluster and the units within that cluster The following regression model is assumed: $$\log(T_{i,l}) = \beta'x_{i,l} + b_i + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i$$ where $\beta$ is unknown regression parameter vector, $x_{i,l}$ is a vector of covariates. $b_i$ is a cluster-specific random effect (random intercept).

The random effects $b_i,\;i=1,\dots, N$ are assumed to be i.i.d. with a univariate density $g_{b}(b)$. The error terms $\varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i$ are assumed to be i.i.d. with a univariate density $g_{\varepsilon}(e)$. Densities $g_{b}$ and $g_{\varepsilon}$ are both expressed as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). We distinguish two, theoretically equivalent, specifications.

In the following, the density for $\varepsilon$ is explicitely described. The density for $b$ is obtained in an analogous manner.

[object Object],[object Object] Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.

The mixture weights $w_{j},\;j=-K,\dots, K$ are not estimated directly. To avoid the constraints $0 < w_{j} < 1$ and $\sum_{j=-K}^{K}\,w_j = 1$ transformed weights $a_{j},\;j=-K,\dots, K$ related to the original weights by the logistic transformation: $$a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}$$ are estimated instead.

A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008). If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.

In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.

Bivariate normal distribution It is assumed that the pair of random intercepts from the onset and time-to-event part of the model are normally distributed with zero mean and an unknown covariance matrix $D$.

A priori, the inverse covariance matrix $D^{-1}$ is addumed to follow a Wishart distribution.

Unknown correlation between the basis G-splines Each pair of basis G-splines describing the distribution of the random intercept in the onset part and the time-to-event part of the model is assumed to be correlated with an unknown correlation coefficient $\varrho$. Note that this is just an experiment and is no more further supported.

Prior distribution on $\varrho$ is assumed to be uniform. In the MCMC, the Fisher Z transform of the $\varrho$ given by $$Z = -\frac{1}{2}\log\Bigl(\frac{1-\varrho}{1+\varrho}\Bigr)=\mbox{atanh}(\varrho)$$ is sampled. Its prior is derived from the uniform prior $\mbox{Unif}(-1,\;1)$ put on $\varrho.$

The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).

Usage

bayessurvreg3(formula, random, formula2, random2,
   data = parent.frame(),
   classification,
   classParam = list(Model = c("Examiner", "Factor:Examiner"),
                     a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
                     init.sens = NULL, init.spec = NULL),
   na.action = na.fail, onlyX = FALSE,
   nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),   
   prior, prior.beta, prior.b, init = list(iter = 0),
   mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
                   k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   prior2, prior.beta2, prior.b2, init2,
   mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
                    k.overrelax.sigma = 1, k.overrelax.scale = 1,
                    type.update.a.b = "slice", k.overrelax.a.b = 1,
                    k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   priorinit.Nb,
   rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
   store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
                r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
                a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), 
   dir = getwd())
bayessurvreg3Para(formula, random, formula2, random2,
   data = parent.frame(),
   classification,
   classParam = list(Model = c("Examiner", "Factor:Examiner"),
                     a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
                     init.sens = NULL, init.spec = NULL),
   na.action = na.fail, onlyX = FALSE,
   nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),   
   prior, prior.beta, prior.b, init = list(iter = 0),
   mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
                   k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   prior2, prior.beta2, prior.b2, init2,
   mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
                    k.overrelax.sigma = 1, k.overrelax.scale = 1,
                    type.update.a.b = "slice", k.overrelax.a.b = 1,
                    k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   priorinit.Nb,
   rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
   store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
                r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
                a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), 
   dir = getwd())

Arguments

formula

model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time.

The left-hand side of the formula must be an object created using

random

formula for the `random' part of the model. In the case of doubly-censored data, this is the random formula for the onset time. With this version of the function only c{ random = ~1 }

formula2

model formula for the regression of the time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for formula applies here.

random2

specification of the `random' part of the model for time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for random applies here.

data

optional data frame in which to interpret the variables occuring in the formula, formula2, random, random2 statements.

classification

data.frame with the information for a model which considers misclassification of the event times. It is assumed to have the following columns where the position of columns is important, not their names:

idUnit:

Value

A list of class bayessurvreg3 containing an information concerning the initial values and prior choices.

item

classParam
na.action
onlyX
nsimul
prior
prior.b
prior.beta
init
mcmc.par
prior2
prior.b2
prior.beta2
init2
mcmc.par2
priorinit.Nb
rho
store
dir

code

priorinit.Nb

describe

aif TRUE then all the transformed mixture weights $a_{k},$ $k=-K,\dots,K,$ related to the G-spline defining the error distribution of formula are stored.
a.bif TRUE then all the transformed mixture weights $a_{k},$ $k=-K,\dots,K,$ related to the G-spline defining the distribution of the random intercept from formula and random are stored.
a2if TRUE and there are doubly-censored data then all the transformed mixture weights $a_{k},$ $k=-K,\dots,K,$ related to the G-spline defining the error distribution of formula2 are stored.
a.b2if TRUE then all the transformed mixture weights $a_{k},$ $k=-K,\dots,K,$ related to the G-spline defining the distribution of the random intercept from formula2 and random2 are stored.
yif TRUE then augmented log-event times for all observations related to the formula are stored.
y2if TRUE then augmented log-event times for all observations related to formula2 are stored.
rif TRUE then labels of mixture components for residuals related to formula are stored.
r.bif TRUE then labels of mixture components for random intercepts related to formula and random are stored.
r2if TRUE then labels of mixture components for residuals related to formula2 are stored.
r.b2if TRUE then labels of mixture components for random intercepts related to formula2 and random2 are stored.
bif TRUE then the sampled values of the random interceptss related to formula and random are stored.
b2if TRUE then the sampled values of the random interceptss related to formula2 and random2 are stored.

sQuote

Specification
Specification

tabular

cr

k.overrelax.a.b k.overrelax.sigma.b k.overrelax.scale.b

eqn

$0$

References

GKJ (2013+). PAPER BEING PREPARED. WHO KNOWS WHERE PUBLISHED. Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen. Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523--533.

Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.

Examples

Run this code

## See the description of R commands for
## the cluster specific AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2007).
##
## R commands available in the documentation
## directory of this package
## - see ex-tandmobCS.R and
##   http://www.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf
##

Run the code above in your browser using DataLab