lmeNBBayes: Generate posterior samples from a flexible mixed effect negative binomial regression model.

Description

Let $Y_{ij}$ be the response count at $j$th repeated measure from the $i$th patient ($i=1,\cdots,N$ and $j=1,\cdots,n_i$). The negative binomial mixed-effect independent model assumes that given the random effect $G[i]=g[i]$, the count response from the same subjects i.e., $Y_{ij}$ and $Y_{ij'}$ are conditionally independent and follow the negative binomial distribution:

$ Y[ij] | G[i]=g[i], beta i.i.d.~ NB(Y[ij]; size=exp(X[ij]*beta),prob=g[i]) $

where $\boldsymbol{X}_{ij}$ is the covariates for mean counts. This formulation results in $\log E(Y_{i,j}) = \log(\mu_{\frac{1}{G}} - 1 ) + \boldsymbol{X}_{ij}^T\boldsymbol{\beta}$. To allow flexible form of a random effect distribution, we assume that the patient-specific random effect is assumed to be from Dirichlet process mixture of beta distributions. This essentially means that random effect $G[i]$ is from an infinite mixture of Beta distributions:

$ G[i]| { a[G[h]], r[G[h]], pi[h] }[h=1]^{infty} ~ sum pi[h] Beta(G[i]; shape1=a[G[h]],shape2=r[G[h]]) $,

where $pi[h]$ is modelled with the stick-breaking prior. Introducing latent variable $V[h],h=1,2,...$, this prior is defined as $pi[1]=V[1]$ and $ pi[h]=V[h]prod[l < h] (1-V[h]) $ for $h>1$ $V[h] i.i.d. ~ Beta(1,D)$.

The rest of priors are specified as: $ beta ~ N(mu,Sigma) $,

$ (a[G],r[G]) ~ Unif(a[G];\textrm{min}=0.5,\textrm{max}=max[a[G]])Unif(r[G];\textrm{min}=0.5, \textrm{max}=max[r[G]]) $,

$ D ~ Unif(v;min=a_D,max=ib_D) $.

The default values of the hyperparameters are $mu[beta]$ = rep(0,p), $Sigma[beta]$ = diag(5,p), $max[a[G]]$ = 30, $a[D]$ = 0.01 and $ib[D]$ = 3. These selections of hyperparameters could be used as uninformative ones. The function lmeNBBayes also allows generating posterior samples from the parametric version of the model above which simply assumes that the random effect is from the single beta distribution. (The rest of the prior specifications are the same).

Usage

lmeNBBayes(formula,data, ID, B = 105000, burnin = 5000,   printFreq = B, M = NULL, probIndex = FALSE, thin =1,labelnp=NULL, epsilonM = 1e-4, 
	   para = list(mu_beta = NULL,Sigma_beta = NULL,
	   	       max_aG=30,mu_lnD=NULL,sd_lnD=NULL), DP=TRUE,thinned.sample=FALSE, proposalSD = NULL)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The formula must contain an intercept term.

data

A data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. The each row must contains the data corresponding to the repeated measure $j$ of subjects and the rows $(i,j)$s must be ordered in a way that measurements from a subject is clustered together as $(1,1)$,...,$(1,n[1])$,$(2,1)$,...,$(2,n[2])$,...,$(N,n[N])$.

A vector of length $ \sum[i=1]^N n[i] $, containing the patient IDs that corresponds to data. i.e., c(rep(ID_1,n_1),rep(ID_2,n_2),...,rep(ID_N,n_N)). The length must be the same as the number of rows of data. Missing ID values are NOT accepted.

A scalar, the number of McMC iterations.

burnin

A scalar for a burn-in period. The proposal variance of the Metoropolice-Hasting rate is adjusted during the burn-in preiod.

printFreq

An integer value, indicating the frequency of iterations to print during the McMC run.

Necessary only if DP=1. Our Gibbs sampler approximates the infinite mixture of beta distributions by truncating it with M components by setting $V[M]=1$ so that $pi_M = 1 - \sum_{h=1}^{M-1} \pi_h$. If M is NULL, M is selected so that the amount of probability assigned to the final mass point is expected to be epsilonM. i.e., $E(\pi_M)=E\{ (\pi_M | D )\} = E (\{ 1- 1/(D+1)\}^{M-1}) < \epsilon$.

probIndex

Logical, if it is TRUE then the conditional probability index is computed for each patient at every thinning after discarding burn-in.

thin

Thinning frequency. Necessary if probIndex is TRUE.

labelnp

A vector of length $ \sum[i=1]^N n[i] $, containing 0 or 1. Zero indicates that the corresponding repeated measure should be treated as pre-scan and 1 indicates that it is a new scan. labelnp is necessary only if probIndex is TRUE.

epsilonM

A scalar. See the description of M.

para

A list containing hyperparameter values. If DP=0 then the followings must be specified: mu_beta (a vector of length $p$), Sigma_beta (a $p$ by $p$ covariance matrix) and max_aG (a positive scaler). If DP=1 then in addition to the above parameters,mu_lnD (positive scaler) sd_lnD (positive scaler) must be specified. If some of these are not specified then the default values discussed in description are used.

If DP=1 then the flexible mixed effect negative binomial regression is fit to the dataset. If DP=0 then the random effect distribution is assumed to be a single beta distribution.

thinned.sample

Logical. If true then return only the thinned samples, else returns the entire MCMC sample of size B.

proposalSD

List object containing two list objects min and max, which contain minimum and maximum values of the proposal standard deviations.

If DP=0 then a list object min (max) must contains 3 elements corresponding to minimum (maximum) values of the proposal standard deviation of aG, rG and beta. See details for beta.

If DP=1 then a list object min (max) must contains 4 elements corresponding to minimum (maximum) values of the proposal standard deviation of aG, rG, beta and ln D. See details for beta.

Details

For the parameters with non-conjugate priors $beta,D,a[G],b[G]$, the Metropolis Hasting (MH) algorithm is employed to sample from their full conditional distributions. For $D,a[G],b[G]$, the MH algorithm is performed separately with a normal proposal distribution, where its proposal variance is tuned during the burn-in to have the acceptance rates range between 0.2 and 0.6. One can adjust the minimum and maximum of the proposal sd via the proposalSD arguments. For each element of $beta$, we found that updating each regression coefficient with separate MH algorithm resulted in poor mixing in the Markov chain when high correlation is assumed in some of $beta$ in the prior. Therefore, the MH algorithm is performed simultaneously for all $beta$ and a MVN proposal distribution is employed with $a$$Sigma$ as its proposal covariance matrix, where $Sigma$ is the covariance of a prior for $beta$ and $a$ is a tuning scaler adjusted during the burn-in period to have the acceptance rates range between 0.2 and 0.6.

References

Kondo, Y., Zhao, Y. and Petkau, A.J., "A flexible mixed effect negative binomial regression model for detecting unusual increases in MRI lesion counts in individual multiple sclerosis patients".

Examples

Run this code


## Not run: 
# 
# ## generate samples from DSMSB review 2
# d <- getS.StatInMed(rev=2,iseed=1,dist="YZ",Scenario="full")
# formula.fit <- Y ~ timeInt1:trtAss + timeInt2:trtAss
# 
# B <- 10000
# burnin <- 1000
# thin <- 2
# fit <- lmeNBBayes(formula=formula.fit,data=d, ID=d$ID, 
#                   B = B, burnin = burnin,  thin=thin)
# ## The output can be printed out:
# fit 
# 
# 
# ## Now, compute the conditional probability index using the mean function of placebo patients.
# ## We need to modify two things in output of lmeNBBayes.
# ## 1st, change the formula so that it does not distinguish between treatment and placebo
# fit$para$formula <- Y ~ timeInt1 + timeInt2
# ## 2nd, disregard the coefficient that corresponds to the treated patients
# fit$beta <- fit$beta[,-c(3,5)]
# cpi <- index.batch.Bayes(data=d,labelnp=d$labelnp,ID=d$ID,
#                          olmeNBB=fit,printFreq=10^7)
# cpi 
# 
# ## finally access the accuracy of the CPI estimates in terms of RMSE
# Npat <- length(unique(d$ID))
# est <- cpi$condProbSummary[,1]
# true <- d$probIndex[1:Npat]
# sqrt( mean( ( est - true )^2 ,na.rm=TRUE) )
#               
# 
# ## End(Not run)

Run the code above in your browser using DataLab