BayesSurvcor: The function to perform Bayesian parametric and semi-parametric regression analysis for cluster-correlated univariate time-to-event data.

Description

Univariate regression analysis of clustered time-to-event data is performed using Cox proportional hazards frailty model in a Bayesian framework. The prior of baseline hazard function can be specified by either parametric Weibull or non-parametric mixture of piecewise exponential models (PEM). Either parametric normal or non-parametric Dirichlet process mixture of normals (DPM) can be adopted for the prior of cluster-specific random effects distribution.

Usage

BayesSurvcor(survData, hyperParams, startValues, mcmcParams, numReps, thin, path, 
burninPerc = 0.5, hz.type = "Weibull", re.type = "Normal", storeV = TRUE, nChain = 1)

Arguments

survData

The data frame containing univariate survival data with covariate matrix from n subjects. It is of dimension $n\times(3+p)$, where $p$ is the number of covariates. The first two columns correspond to $y$, $\delta$, cluster membership. Note th

hyperParams

a vector containing hyperparameter values for hierarchical model. For Weibull-Normal, (hz.type="Weibull", re.type="Normal"), it is of length 6; c($a$, $b$, $c$, $d$, $\rho_1$, $\rho_2$). For Weibull-DPM, (hz.type="We

startValues

a list containing vectors of starting values of parameters. The length of list must be equal to the number of chains (nChain). For Weibull-Normal, (hz.type="Weibull", re.type="Normal"), each of list components is of

mcmcParams

a vector containing variables required for Metropolis-Hastings-Green (MHG) algorithm. For Weibull models (Weibull-Normal and Weibull-DPM: hz.type="Weibull"), it is of length 2; c(mhProp_alpha_var, mhProp_V_var). mhProp_alpha_var and mhProp_V

numReps

total number of scans

thin

extent of thinning

path

the name of directory where the results are saved

burninPerc

the proportion of burn-in

hz.type

prior specification for baseline hazard functions : either "Weibull" or "PEM"

re.type

prior specification for cluster-specific random effects distribution: either "Normal" or "DPM"

storeV

logical value to determine whether all the posterior samples of $V$ are to be stored.

nChain

the number of chains

Value

BayesSurvcor returns an object of class BayesSurvcor. names(object$chain1) shows the list of posterior samples of model parameters, the number of acceptance in MH algorithm, etc.

Details

The function BayesSurvcor fits Bayesian semi-parametric (piecewise exponential mixture) and parametric (Weibull) models to cluster-correlated univariate time-to-event data. Let $t_{ji}$ denote time to event of interest from subject $i=1,...,n_j$ in cluster $j=1,...,J$. The covariates $x_{ji}$ are incorporated via Cox proportional hazards model: $$h(t_{ji} | x_{ji}) = h_{0}(t_{ji})\exp(\beta x_{ji} + V_{j}), t_{ji}>0,$$ where $h_0$ is an unspecified baseline hazard function and $\beta$ is a vector of $p$ log-hazard ratio regression parameters. $V_j$s are cluster-specific random effects. For parametric Normal prior specification for a vector of cluster-specific random effects, we assume $V$ arise as i.i.d. draws from a mean 0 Normal distribution with variance $\sigma^2$. Specifically, the priors can be written as follows: $$V_j \sim Normal(0, \sigma^2),$$ $$\sigma^2 \sim inverse-Gamma(\rho_1, \rho_2).$$ For DPM prior specification for $V_j$, we consider non-parametric Dirichlet process mixture of Normal distributions: the $V_j$'s' are draws from a finite mixture of M Normal distributions, each with their own mean and variance, ($\mu_m$, $\sigma_m^2$) for $m=1,...,M$. Let $m_j\in{1,...,M}$ denote the specific component to which the $j$th cluster belongs. Since the class-specific ($\mu_m$, $\sigma_m^2$) are not known they are taken to be draws from some distribution, $G_0$, often referred to as the centering distribution. Furthermore, since the true class memberships are unknown, we denote the probability that the $j$th cluster belongs to any given class by the vector $p=(p_1,..., p_M)$ whose components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the $J$ clusters across the $M$ classes, a natural prior for $p$ is the conjugate symmetric $Dirichlet(\tau/M,...,\tau/M)$ distribution; the hyperparameter, $\tau$, is often referred to as a the precision parameter. The prior can be represented as follows ($M$ goes to infinity): $$V_j | m_j \sim Normal(\mu_{m_j}, \sigma_{m_j}^2),$$ $$(\mu_m, \sigma_m^2) \sim G_{0},~~ for ~m=1,...,M,$$ $$m_j | p \sim Discrete(m_j| p_1,...,p_M),$$ $$p \sim Dirichlet(\tau/M,...,\tau/M),$$ where $G_0$ is taken to be a multivariate Normal/inverse-Gamma (NIG) distribution for which the probability density function is the following product: $$f_{NIG}(\mu, \sigma^2 | \mu_0, \zeta_0, a_0, b_0) = f_{Normal}(\mu | 0, 1/\zeta_0^2) \times f_{inv-Gamma}(\sigma^2 | a_0, b_0).$$ In addition, we use $Gamma(a_{\tau}, b_{\tau})$ as the hyperprior for $\tau$. For non-parametric prior specification (PEM) for baseline hazard function, let $s_{\max}$ denote the largest observed event time. Then, consider the finite partition of the relevant time axis into $K + 1$ disjoint intervals: $0

where $I(\cdot)$ is the indicator function and $s_0 \equiv 0$. In our proposed Bayesian framework, our prior choices are:

$$\pi(\beta) \propto 1,$$ $$\lambda | K, \mu_{\lambda}, \sigma_{\lambda}^2 \sim MVN_{K+1}(\mu_{\lambda}1, \sigma_{\lambda}^2\Sigma_{\lambda}),$$ $$K \sim Poisson(\alpha),$$ $$\pi(s | K) \propto \frac{(2K+1)! \prod_{k=1}^{K+1}(s_k-s_{k-1})}{(s_{K+1})^{(2K+1)}},$$ $$\pi(\mu_{\lambda}) \propto 1,$$ $$\sigma_{\lambda}^{-2} \sim Gamma(a, b).$$ Note that $K$ and $s$ are treated as random and the priors for $K$ and $s$ jointly form a time-homogeneous Poisson process prior for the partition. The number of time splits and their positions are therefore updated within our computational scheme using reversible jump MCMC. For parametric Weibull prior specification for baseline hazard function, $h_{0}(t) = \alpha \kappa t^{\alpha-1}$. In our Bayesian framework, our prior choices are: $$\pi(\beta) \propto 1,$$ $$\pi(\alpha) \sim Gamma(a, b),$$ $$\pi(\kappa) \sim Gamma(c, d).$$

References

Lee, K. H., Haneuse, S., Schrag, D., and Dominici, F. (2014), Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis, Journal of the Royal Statistical Society: Series C, in press. Lee, K. H., Dominici, F., Schrag, D., and Haneuse, S., Hierarchical models for cluster-correlated semicompeting risks data with application to quality of end-of-life care for pancreatic cancer, arXiv:1502.00526; submitted.

Examples

Run this code

# loading a data set	
data(scrCorData)

survData <- scrCorData[,3:8]
colnames(survData)[c(1,2)] <- c("time", "event")

n = dim(survData)[1]
p = dim(survData)[2]-3
J <- length(unique(survData$cluster))

###############################
# setting hyperparameter values


a		<- 0.7	# prior parameters for 1/sigma_1^2
b		<- 0.7
alpha	<- 10	# prior parameter for K
c_lam 	<- 1	# prior parameter for MVN-ICAR specification

rho1 	<- 0.5 # prior for zeta
rho2 	<- 0.01

hyperParams <- c(a, b, alpha, c_lam, rho1, rho2)



#########################
# setting starting values

s_max		<- max(survData$time[survData$event == 1])

grid <- s_max/50

beta		<- rep(0.1, p)
s			<- unique(sort(c(sample(1:s_max, 10), s_max)))
K			<- length(s) - 1
lambda		<- runif(K+1, -4, -3)
sigSq_lam	<- var(lambda)
mu_lam		<- mean(lambda)
V	<- rep(0, J)
zeta <- 0.5



# chain 1

startValues <- list()
startValues[[1]] <- as.vector(c(beta, K, mu_lam, sigSq_lam, lambda, s, V, zeta))

# chain 2

beta		<- rep(0.2, p)
lambda		<- runif(K+1, -4.1, -3)
sigSq_lam	<- var(lambda)
mu_lam		<- mean(lambda)
V	<- rep(0.1, J)
zeta <- 0.7

startValues[[2]] <- as.vector(c(beta, K, mu_lam, sigSq_lam, lambda, s, V, zeta))

##################################################
# setting variable values needed for MH algorithm

C				<- 0.20
delPert		<- 0.5
s_propBI	<- floor(sort(unique(survData$time[survData$event == 1])))
s_propBI	<- s_propBI[s_propBI < s_max]
num_s_propBI	<- length(s_propBI)
K_max 			<- 50
time_lambda	<- seq(grid, s_max, grid)
nTime_lambda	<- length(time_lambda)
mhProp_V_var <- 0.05


mcmcParams <- c(C, delPert, num_s_propBI, K_max, s_max, nTime_lambda, s_propBI, time_lambda, 
mhProp_V_var)





##################################################
# number of chains

numReps	= 2e6
thin	= 20
burninPerc	= 0.5
path1	= "outcome/"
hz.type = "PEM"
re.type     = "Normal"
storeV      = TRUE
nChain	= 2

# fitting Bayesian semi-parametric regression model to univariate survival data	

fit <- BayesSurvcor(survData, hyperParams, startValues, mcmcParams, numReps, thin, path1, 
burninPerc, hz.type, re.type, storeV, nChain)
		
print(fit)
summary(fit)		
		
## plot for estimates of baseline hazard function
plot(fit)

Run the code above in your browser using DataLab