bayessurvreg3: Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data with flexibly specified random effects and/or error distribution.

Description

A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.

A~univariate random effect (random intercept) with the distribution expressed as a~penalized normal mixture can be included in the model to adjust for clusters.

The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.

For details, see $\mbox{Kom\'{a}rek}$ (2006) and $\mbox{Kom\'{a}rek}$ and Lesaffre (2006).

We explain first in more detail a model without doubly censoring. Let $T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i$ be event times for $i$th cluster and the units within that cluster The following regression model is assumed: $$\log(T_{i,l}) = \beta'x_{i,l} + b_i + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i$$ where $\beta$ is unknown regression parameter vector, $x_{i,l}$ is a vector of covariates. $b_i$ is a cluster-specific random effect (random intercept).

The random effects $b_i,\;i=1,\dots, N$ are assumed to be i.i.d. with a~univariate density $g_{b}(b)$. The error terms $\varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i$ are assumed to be i.i.d. with a~univariate density $g_{\varepsilon}(e)$. Densities $g_{b}$ and $g_{\varepsilon}$ are both expressed as a~mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). We distinguish two, theoretically equivalent, specifications.

In the following, the density for $\varepsilon$ is explicitely described. The density for $b$ is obtained in an analogous manner.

[object Object],[object Object] Personally, I found Specification 2 performing better. In the paper $\mbox{Kom{\'a}rek}$ and Lesaffre (2006) only Specification 2 is described.

The mixture weights $w_{j},\;j=-K,\dots, K$ are not estimated directly. To avoid the constraints $0 < w_{j} < 1$ and $\sum_{j=-K}^{K}\,w_j = 1$ transformed weights $a_{j},\;j=-K,\dots, K$ related to the original weights by the logistic transformation: $$a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}$$ are estimated instead.

A~Bayesian model is set up for all unknown parameters. For more details I refer to $\mbox{Kom\'{a}rek and Lesaffre (2006)}$ (manuscript can be found in the documentation of this package) and to $\mbox{Kom\'{a}rek}$ (2006). If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.

In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.

Bivariate normal distribution It is assumed that the pair of random intercepts from the onset and time-to-event part of the model are normally distributed with zero mean and an unknown covariance matrix $D$.

A priori, the inverse covariance matrix $D^{-1}$ is addumed to follow a Wishart distribution.

Unknown correlation between the basis G-splines Each pair of basis G-splines describing the distribution of the random intercept in the onset part and the time-to-event part of the model is assumed to be correlated with an unknown correlation coefficient $\varrho$. Note that this is just an experiment and is no more further supported.

Prior distribution on $\varrho$ is assumed to be uniform. In the MCMC, the Fisher Z transform of the $\varrho$ given by $$Z = -\frac{1}{2}\log\Bigl(\frac{1-\varrho}{1+\varrho}\Bigr)=\mbox{atanh}(\varrho)$$ is sampled. Its prior is derived from the uniform prior $\mbox{Unif}(-1,\;1)$ put on $\varrho.$

The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).

Usage

bayessurvreg3(formula, random, formula2, random2,
   data = parent.frame(),
   na.action = na.fail, onlyX = FALSE,
   nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),   
   prior, prior.beta, prior.b, init = list(iter = 0),
   mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
                   k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   prior2, prior.beta2, prior.b2, init2,
   mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
                    k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   priorinit.Nb,
   rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
   store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
                r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
                a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), 
   dir = getwd())

Arguments

formula

model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time.

The left-hand side of the formula must be an~object created using

random

formula for the `random' part of the model. In the case of doubly-censored data, this is the random formula for the onset time. With this version of the function only c{ random = ~1 }

formula2

model formula for the regression of the time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for formula applies here.

random2

specification of the `random' part of the model for time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for random applies here.

data

optional data frame in which to interpret the variables occuring in the formula, formula2, random, random2 statements.

na.action

the user is discouraged from changing the default value na.fail.

onlyX

if TRUE no MCMC sampling is performed and only the design matrix (matrices) are returned. This can be useful to set up correctly priors for regression parameters in the presence of factor covariates.

nsimul

a list giving the number of iterations of the MCMC and other parameters of the simulation. [object Object],[object Object],[object Object],[object Object]

prior

a~list specifying the prior distribution of the G-spline defining the distribution of the error term in the regression model given by formula and random. See prior argument of

prior.b

a~list specifying the prior distribution of the G-spline defining the distribution of the random intercept in the regression model given by formula and random. See prior argument of

prior.beta

prior specification for the regression parameters, in the case of doubly-censored data for the regression parameters of the onset time, i.e. it is related to formula and random. This should be a~list with the

init

an~optional list with initial values for the MCMC related to the model given by formula and random. The list can have the following components: [object Object],[object Object],[object Object],[object Object],[object Objec

mcmc.par

a list specifying how some of the G-spline parameters related to the distribution of the error term and of the random intercept from formula and random are to be updated. See

prior2

a list specifying the prior distribution of the G-spline defining the distribution of the error term in the regression model given by formula2 and random2. See prior argument of

prior.b2

prior specification for the parameters related to the random effects from formula2 and random2. This should be a~list with the same structure as prior.b.

It is ignored if the argument priorinit.Nb<

prior.beta2

prior specification for the regression parameters of time-to-event in the case of doubly censored data (related to formula2 and random2). This should be a~list with the same structure as prior.beta.

init2

an optional list with initial values for the MCMC related to the model given by formula2 and random2. The list has the same structure as init.

mcmc.par2

a list specifying how some of the G-spline parameters related to formula2 and random2 are to be updated. The list has the same structure as mcmc.par.

priorinit.Nb

a list specifying the prior of the random intercepts in the case of the AFT model with doubly-interval-censored data and onset, time-to-event random intercepts following bivariate normal distribution.

The list should have the following co

rho

a list specifying possible correlation between the onset random intercept and the time-to-event random intercept in the experimental version of the model. If not given correlation is fixed to $0$.

It is ignored if the argument prior

store

a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[obje

dir

a string that specifies a directory where all sampled values are to be stored.

Value

A list of class bayessurvreg3 containing an information concerning the initial values and prior choices.

References

$\mbox{Kom\'{a}rek, A.}$ (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen. $\mbox{Kom\'{a}rek}$, A. and Lesaffre, E. (2006). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Submitted. See Komarek_Lesaffre_2006.pdf.

Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.

Examples

Run this code

## See the description of R commands for
## the cluster specific AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2006),

## R commands available in the documentation
## directory of this package
## as tandmobCS.pdf, tandmobCS.R.

Run the code above in your browser using DataLab