negbinBvs: Bayesian variable selection for the negative binomial model

Description

This function performs Bayesian regression modelling of overdispersed count data including variable selection via spike and slab priors. Posterior inference is based on MCMC sampling techniques.

Usage

negbinBvs(y, offset = NULL, X, model = list(), prior = list(), mcmc = list(), start = NULL, BVS = TRUE)

Arguments

an integer vector of count data

offset

an (optional) offset term; should be NULL or an integer vector of length equal to the number of counts.

a design matrix (including an intercept term)

model

an (optional) list specifying the structure of the model (see details)

prior

an (optional) list of prior settings and hyper-parameters controlling the priors (see details)

mcmc

an (optional) list of MCMC sampling options (see details)

start

an (optional), numeric vector containing starting values for the regression effects (including an intercept term); defaults to NULL (i.e. a vector of zeros is used).

BVS

if TRUE (default), Bayesian variable selection is performed to identify regressors with a non-zero effect; otherwise, an unrestricted model is estimated (without variable selection).

Value

samplesNB

pogitprint.pogitsummary.pogitplot.pogita named list containing the samples from the posterior distribution of the parameters in the negative binomial model (see also msave):

beta, rho: regression coefficients $\beta$ and $\rho$
pdeltaBeta: P($\delta_\beta$=1)
psiBeta: scale parameter $\psi_\beta$ of the slab component

data

a list containing the data y, offset and X

model.nb

a list containing details on the model specification, see details for model

mcmc

see details for mcmc

prior.nb

see details for prior

dur

a list containing the total runtime (total) and the runtime after burn-in (durM), in seconds

acc.rho

acceptance rate of parameter $\rho$

BVS

see arguments

start

a list containing starting values, see arguments

family

"negbin"

call

function call

Details

The method provides Bayesian variable selection in regression modelling of overdispersed count data. The negative binomial distribution is derived marginally from a Poisson-Gamma (mixture) model, which can be interpreted as an overdispersed Poisson model with observation-specific random intercept log $\gamma$, where $\gamma|\rho \sim \Gamma(\rho,\rho)$. A hyper-prior for $\rho$ is specified as $\rho \sim \Gamma(c_0,C_0)$, see details for prior below. By default, variable selection is incorporated in the model based on mixture priors with a spike and a slab component for the regression effects $\beta$. More specifically, a Dirac spike is used, i.e. a point mass at zero, and (by default), the slab component is specified as a scale mixture of normal distributions, resulting in a Student-t distribution with 2psi.nu degrees of freedom.

The MCMC sampling scheme relies on the representation of the conditional Poisson model as a Gaussian regression model in auxiliary variables, as described in poissonBvs. Data augmentation is based on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter et al. (2009). For details concerning the algorithm, see Dvorzak and Wagner (2016b), available on request from the authors.

Details for the model specification (see arguments):

model

deltafix: an indicator vector of length ncol(X)-1 specifying which regression effects are subject to selection (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector of zeros.

prior

slab: distribution of the slab component, i.e. "Student" (default) or "Normal".
psi.nu: hyper-parameter of the Student-t slab (used for a "Student" slab); defaults to 5.
m0: prior mean for the intercept parameter; defaults to 0.
M0: prior variance for the intercept parameter; defaults to 100.
aj0: a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.
V: variance of the slab; defaults to 5.
w: hyper-parameters of the Beta-prior for the mixture weight $\omega$; defaults to c(wa0=1, wb0=1), i.e. a uniform distribution.
c0, C0: scale and rate of the gamma prior for the hyper-parameter $\rho$; defaults to 2 and 1.
eps: tuning parameter in the MH-step to sample $\rho$; defaults to 0.05.

mcmc

M

number of MCMC iterations after the burn-in phase; defaults to 8000.

burnin

number of MCMC iterations discarded as burn-in; defaults to 2000.

thin

thinning parameter; defaults to 1.

startsel

number of MCMC iterations drawn from the unrestricted model (e.g., burnin/2); defaults to 1000.

verbose

MCMC progress report in each verbose-th iteration step; defaults to 500. If verbose=0, no output is generated.

msave

returns additional output with variable selection details (i.e. posterior samples for $\omega$, $\delta$); defaults to FALSE.

References

Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for overdispersed count data subject to underreporting - An application to norovirus illness in Germany. (Unpublished) working paper.

Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19, 479 - 492.

Examples

Run this code

## Not run: 
# ## Examples below should take about 1-2 minutes.
# 
# ## ------ (use simul_pois1) ------
# data(simul_pois1)
# y <- simul_pois1$y
# X <- as.matrix(simul_pois1[, -1])
# 
# # Bayesian variable selection for simulated data set
# m1 <- negbinBvs(y = y, X = X)
# 
# # print results (check acceptance rate for 'rho')
# print(m1)
# 
# # re-run with adapted tuning parameter 'eps'
# m2 <- negbinBvs(y = y, X = X, prior = list(eps = 0.4)) 
# 
# # print and summarize results
# print(m2)
# summary(m2)
# 
# # alternatively, compare results to overdispersed Poisson model with 
# # normal random intercept (subject to selection), provided in 'poissonBvs' 
# 
# # specify observation-specific random intercept
# cID <- seq_along(y)
# m3  <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID))
# 
# # print, summarize and plot results
# print(m3)
# summary(m3) 
# # note that thetaB is not selected (!)
# 
# plot(m3, burnin = FALSE, thin = FALSE)
# 
# 
# ## ------ (use data set "azdrg112" from package "COUNT") ------
# 
# if (!requireNamespace("COUNT", quietly = TRUE)){
#  stop("package 'COUNT' is needed for this example to work. 
#        Please install it.")
# }
# 
# library(COUNT)
# # load data set 'azdrg112' 
# # (Arizona Medicare data for DRG (Diagnostic Related Group) 112)
# data(azdrg112) 
# 
# y <- as.numeric(azdrg112$los) # hospital length of stay: 1-53 days
# X <- as.matrix(azdrg112[,-1]) # covariates (gender, type1, age75)
# m4 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000))  
# 
# # print results (check acceptance rate for 'rho')
# print(m4)
# summary(m4)
# plot(m4, burnin = FALSE)
# 
# # adapte tuning parameter eps (and set BVS to FALSE)
# prior <- list(eps = 0.1)
# m5 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior, 
#                 BVS = FALSE)  
# 
# # print, summarize and plot results
# print(m5)
# summary(m5)
# plot(m5, burnin = FALSE, thin = FALSE)
# plot(m5, type = "acf", lag.max = 50)
# ## End(Not run)

Run the code above in your browser using DataLab