logit.spike: Spike and slab logistic regression

Description

MCMC algorithm for logistic regression models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

Usage

logit.spike(formula,
            niter,
            data,
            subset,
            prior = NULL,
            na.action = options("na.action"),
            contrasts = NULL,
            drop.unused.levels = TRUE,
            initial.value = NULL,
            ping = niter / 10,
            nthreads = 0,
            clt.threshold = 2,
            mh.chunk.size = 10,
            proposal.df = 3,
            seed = NULL,
            ...)

Arguments

formula

formula for the maximal model (with all variables included), this is parsed the same way as a call to glm, but no family argument is needed. Like glm<

niter

The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.

data

An optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment

subset

An optional vector specifying a subset of observations to be used in the fitting process.

prior

A list such as that returned by SpikeSlabPrior. If prior is supplied it will be used. Otherwise a prior distribution will be built using the remaining arguments. See

na.action

A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The factory-fresh

contrasts

An optional list. See the contrasts.arg of model.matrix.default.

drop.unused.levels

A logical value indicating whether factor levels that are unobserved should be dropped from the model.

initial.value

Initial value for the MCMC algorithm. Can either be a numeric vector, a glm object (from which the coefficients will be used), or a logit.spike obje

ping

If positive, then print a status update to the console every ping MCMC iterations.

nthreads

The number of CPU-threads to use for data augmentation. There is some small overhead to stopping and starting threads. For small data sets, thread overhead will make it faster to run single threaded. For larger data sets multi-threading can

clt.threshold

When the model is presented with binomial data (i.e. when the response is a two-column matrix) the data augmentation algorithm can be made more efficient by updating a single, asymptotically normal scalar quantity for each unique value of

mh.chunk.size

The maximum number of coefficients to draw in a single "chunk" of a Metropolis-Hastings update. See details.

proposal.df

The degrees of freedom parameter to use in Metropolis-Hastings proposals. See details.

seed

Seed to use for the C++ random number generator. It should be NULL or an int. If NULL the seed value will be taken from the global .Random.seed object.

...

Extra arguments to be passed to SpikeSlabPrior.

Value

Returns an object of class logit.spike, which inherits from lm.spike. The returned object is a list with the following elements
betaA niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.
priorThe prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Details

Model parameters are updated using a composite of three Metropolis-Hastings updates. An auxiliary mixture sampling algorithm (Tuchler 2008) updates the entire parameter vector at once, but can mix slowly.

The second algorithm is a random walk Metropolis update based on a multivariate T proposal with proposal.df degrees of freedom. If proposal.df is nonpositive then a Gaussian proposal is used. The variance of the proposal distribution is based on the Fisher information matrix evaluated at the current draw of the coefficients.

The third algorithm is an independence Metropolis sampler centered on the posterior mode with variance determined by posterior information matrix (Fisher information plus prior information). If proposal.df > 0 then the tails of the proposal are inflated so that a multivariate T proposal is used instead.

For either of the two MH updates, at most mh.chunk.size coefficients will be updated at a time. At each iteration, one of the three algorithms is chosen at random. The auxiliary mixture sampler is the only one that can change the dimension of the coefficient vector. The MH algorithms only update the coefficients that are currently nonzero.

References

Tuchler (2008), "Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling", Journal of Computational and Graphical Statistics, 17 76 -- 94.

Examples

Run this code

data(Pima.tr)
data(Pima.te)
pima <- rbind(Pima.tr, Pima.te)
model <- logit.spike(type == "Yes" ~ ., data = pima, niter = 500)
plot(model)
plot(model, "fit")
plot(model, "residuals")
plot(model, "size")
summary(model)

Run the code above in your browser using DataLab