For details, see Komárek (2006) and Komárek and Lesaffre (2006). We explain first in more detail a model without doubly censoring. Let $T[i,l], i=1,..., N, l=1, 2$ be event times for $i$th cluster and the first and the second unit. The following regression model is assumed: $$\log(T_{i,l}) = \beta'x_{i,l} + \varepsilon_{i,l},\quad i=1,\dots,N,\;l=1,2$$ where $beta$ is unknown regression parameter vector and $x[i,l]$ is a vector of covariates. The bivariate error terms $epsilon[i] = (epsilon[i,1], epsilon[i,2])', i=1,..., N$ are assumed to be i.i.d. with a~bivariate density $g[epsilon](e[1], e[2])$. This density is expressed as a~mixture of Bayesian G-splines (normal densities with equidistant means and constant variance matrices). We distinguish two, theoretically equivalent, specifications.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2006) only Specification 2 is described.
The mixture weights $w[j[1],j[2]], j[1]=-K[1],..., K[1], j[2]=-K[2],..., K[2]$ are not estimated directly. To avoid the constraints $0 < w[j[1],j[2]] < 1$ and $sum[j[1]=-K[1]][K[1]]sum[j[2]=-K[2]][K[2]]w[j[1],j[2]]=1$ transformed weights $a[j[1],j[2]], j[1]=-K[1],..., K[1], j[2]=-K[2],..., K[2]$ related to the original weights by the logistic transformation: $$a_{j_1,j_2} = \frac{\exp(w_{j_1,j_2})}{\sum_{m_1}\sum_{m_2}\exp(w_{m_1,m_2})}$$ are estimated instead.
A~Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2006) and to Komárek (2006).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
bayesBisurvreg(formula, formula2, data = parent.frame(), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), prior2, prior.beta2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE), dir = getwd())
The left-hand side of the formula must be an object created using
Surv
.
formula
concerning the sort order applies here.
na.fail
.TRUE
no MCMC sampling is performed and only the
design matrix (matrices) are returned. This can be useful to set up
correctly priors for regression parameters in the presence of
factor
covariates.
niter
. If not, nburn
is set to niter - 1
. It can be set to zero;
formula
. See prior
argument of
bayesHistogram
function for more detail. In this list
also Specification as described above is specified.
formula
.
This should be a~list with the following components:
beta
parameter in the model.
beta
parameter.
It is recommended to run the function
bayesBisurvreg first with its argument onlyX
set to TRUE
to find out how the betas are sorted. They must correspond to a
design matrix X taken from formula
.
formula
. The list can have the following components:
onlyX=TRUE
if you do not know how
the columns in the design matrix are created.
init$gamma
to zero
for all dimensions (default behavior).
If Specification is 1 init$gamma
should be
approximately equal to the mean value of the residuals in each
margin.
If Specification is 1 this should be approximately equal to the range of the residuals divided by the number of knots in each margin and multiplied again by something like 2/3.
If Specification is 1 this value is not changed by the MCMC and the initial value is always changed to one for both dimensions.
init
of
the function bayesHistogram
for more details.
formula
are to be updated. The list can have the following components (all
of them have their default values):
Default is slice
.
type.update.a == "slice"
some
updates are overrelaxed. Then every k.overrelax.a
th
iteration is not overrelaxed. Default is k.overrelax.a =
1
, i.e. no overrelaxation
k.overrelax.sigma = 1
, i.e. no overrelaxation
k.overrelax.scale = 1
, i.e. no overrelaxation
formula2
. See prior
argument of
bayesHistogram
function for more detail. formula2
).
This should be a~list with the same structure as prior.beta
.
formula2
. The list has the same
structure as init
.
formula2
are to be updated. The list has the same
structure as mcmc.par
.
TRUE
then all the transformed mixture weights
$a[k[1],k[2]],$
$k[1]=-K[1],..., K[1],$
$k[2]=-K[2],..., K[2],$
related to the G-spline of formula
are stored.
TRUE
and there are doubly-censored data then all the transformed mixture weights
$a[k[1],k[2]],$
$k[1]=-K[1],..., K[1],$
$k[2]=-K[2],..., K[2],$
related to the G-spline of formula2
are stored.
TRUE
then augmented log-event times for all
observations related to the formula
are stored.
TRUE
then augmented log-event times for all
observations related to formula2
are stored.
TRUE
then labels of mixture components for
residuals related to formula
are stored.
TRUE
then labels of mixture components for
residuals related to formula2
are stored.
bayesBisurvreg
containing an information
concerning the initial values and prior choices.
dir
argument of this
function (some of them are created only on request, see store
parameter of this function). Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point). The following files are created:
iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
k
, Mean.1
, Mean.2
,
D.1.1
, D.2.1
, D.2.2
, where k = number of mixture components that had probability
numerically higher than zero; Mean.1 =
$E(epsilon[i,1])$; Mean.2 =
$E(epsilon[i,2])$; D.1.1 =
$var(epsilon[i,1])$; D.2.1 =
$cov(epsilon[i,1], epsilon[i,2])$; D.2.2 =
$var(epsilon[i,2])$; all related to the distribution of the error term from the model given by formula
.
mixmoment.sim
, however related to the model
given by formula2
.
formula
.
mweight.sim
, however related to the model
given by formula2
.
mweight.sim
. Related to the model given by formula
.
mmean.sim
, however related to the model
given by formula2
.
formula
. This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration. The file has columns labeled gamma1
,
gamma2
, sigma1
, sigma2
, delta1
,
delta2
, intercept1
, intercept2
,
scale1
, scale2
. The meaning of the values in these
columns is the following: gamma1 = the middle knot $gamma[1]$ in the
first dimension. If Specification is 2, this column
usually contains zeros; gamma2 = the middle knot $gamma[2]$ in the
second dimension. If Specification is 2, this column
usually contains zeros; sigma1 = basis standard deviation $sigma[1]$
of the G-spline in the first dimension. This column contains
a~fixed value if Specification is 2; sigma2 = basis standard deviation $sigma[2]$
of the G-spline in the second dimension. This column contains
a~fixed value if Specification is 2; delta1 = distance $delta[1]$ between the two knots of the G-spline in
the first dimension. This column contains
a~fixed value if Specification is 2; delta2 = distance $delta[2]$ between the two knots of the G-spline in
the second dimension. This column contains a~fixed value if
Specification is 2; intercept1 = the intercept term $alpha[1]$ of
the G-spline in the first dimension. If Specification is 1, this column
usually contains zeros; intercept2 = the intercept term $alpha[2]$ of
the G-spline in the second dimension. If Specification is 1, this column
usually contains zeros; scale1 = the scale parameter $tau[1]$ of the
G-spline in the first dimension. If Specification is 1, this column
usually contains ones; scale2 = the scale parameter $tau[2]$ of the
G-spline in the second dimension. Specification is 1, this column
usually contains ones.
gspline.sim
, however related to the model
given by formula2
.
store$a = TRUE
. The
file contains the transformed weights
$a[k[1],k[2]],$
$k[1]=-K[1],..., K[1],$
$k[2]=-K[2],..., K[2]$ of all mixture
components, i.e. also of components that had numerically zero
probabilities.
This file is related to the model given by formula
.
store$a2 =
TRUE
and in the case of doubly-censored data, the same
structure as mlogweight.sim
, however related to the model
given by formula2
.
store$r = TRUE
. The file
contains the labels of the mixture components into which the
residuals are intrinsically assigned. Instead of double indeces
$(k[1], k[2])$, values from 1 to $(2*K[1]+1)*(2*K[2]+1)$ are stored here. Function
vecr2matr
can be used to transform it back to double
indeces.
store$r2 =
TRUE
and in the case of doubly-censored data, the same
structure as r.sim
, however related to the model
given by formula2
.
lambda
or two
columns labeled lambda1
and lambda2
. These are the
values of the smoothing parameter(s) $lambda$
(hyperparameters of the prior distribution of the transformed
mixture weights $a[k[1],k[2]]$). This file is
related to the model given by formula
.
lambda.sim
, however related to the model
given by formula2
.
formula
. The columns are labeled according to the
colnames
of the design matrix.
beta.sim
, however related to the model
given by formula2
.
store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
store$y2 =
TRUE
and in the case of doubly-censored data, the same
structure as Y.sim
, however related to the model
given by formula2
.
loglik
, penalty
or penalty1
and
penalty2
, logprw
. This file is related to the model
given by formula
. The columns have the following meaning. loglik
$=$ $
-N(log(2*pi) + log(sigma[1]) + log(sigma[2]))
-0.5*sum[i=1][N](
(sigma[1]^2*tau[1]^2)^(-1) * (y[i,1] - x[i,1]'beta - alpha[1] - tau[1]*mu[1,r[i,1]])^2 +
(sigma[2]^2*tau[2]^2)^(-1) * (y[i,2] - x[i,2]'beta - alpha[2] - tau[2]*mu[2,r[i,2]])^2
)$ where $y[i,l]$ denotes (augmented) (i,l)th
true log-event time. In other words, loglik
is equal to the
conditional log-density
$
sum[i=1][N] log(p((y[i,1], y[i,2]) | r[i], beta, G-spline));
$ penalty1: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the first dimension not multiplied by
lambda1
; penalty2: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the second dimension not multiplied by
lambda2
; penalty: If prior$neighbor.system
is different from "uniCAR"
:
the penalty term not multiplied by lambda
; logprw $=$
$
-2*N*log(sum[k[1]]sum[k[2]] exp(a[k[1],k[2]])) +
sum[k[1]]sum[k[2]] N[k[1],k[2]]*a[k[1],k[2]],$
where $N[k[1],k[2]]$ is the number of residuals
assigned intrinsincally to the $(k[1], k[2])$th
mixture component. In other words, logprw
is equal to the conditional
log-density
$
sum[i=1][N] log(p(r[i] | G-spline weights)).$
lambda.sim
, however related to the model
given by formula2
.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failure time model for paired doubly interval-censored data. Statistical Modelling, 6, 3--22. Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
## See the description of R commands for
## the population averaged AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2006),
##
## R commands available in the documentation
## directory of this package as
## - see ex-tandmobPA.R and
## http://www.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf
##
Run the code above in your browser using DataLab