Learn R Programming

BoomSpikeSlab (version 0.5.2)

lm.spike: Spike and slab regression

Description

MCMC algorithm for linear regression models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

Usage

lm.spike(formula,
         niter,
         data,
         subset,
         prior = NULL,
         contrasts = NULL,
         drop.unused.levels = TRUE,
         bma.method = c("SSVS", "ODA"),
         oda.options = list(
             fallback.probability = 0.0,
             eigenvalue.fudge.factor = 0.01),
         ping = niter / 10,
         seed = NULL,
         ...)

Arguments

formula
formula for the maximal model (with all variables included), this is parsed the same way as a call to lm.
niter
The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.
data
an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment
subset
an optional vector specifying a subset of observations to be used in the fitting process.
prior
An optional list returned by SpikeSlabPrior. If prior is missing then a default prior will be used. See SpikeSlabPrior.
contrasts
An optional list. See the contrasts.arg argument of model.matrix.default.
drop.unused.levels
Logical indicating whether unobserved factor levels should be dropped from the model.
bma.method
The MCMC method to use. SSVS is the stochastic search variable selection algorithm from George and McCulloch (1998). ODA is the orthogonal data augmentation method from Clyde and Ghosh (2011).
oda.options
If bma.method == "ODA" then these are some options for fine tuning the ODA algorithm.
  • fallback.probability: Each MCMC iteration will use SSVS instead of ODA with this probability. In cases where the latent data ha
ping
The frequency with which to print status update messages to the screen. For example, if ping == 10 then an update will be printed every 10 MCMC iterations.
seed
An integer to use as the random seed for the underlying C++ code. If NULL then the seed will be set using the clock.
...
Extra arguments to be passed to SpikeSlabPrior (if method == "SSVS") or IndependentSpikeSlabPrior (if met

Value

  • Returns an object of class lm.spike, which is a list with the following elements
  • betaA niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.
  • sigmaA vector of length niter containing the MCMC draws of the residual standard deviation parameter.
  • priorThe prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Details

Both methods ("DA" and "SSVS") draw each variable inclusion indicator given all others, in a Gibbs sampler. The DA method includes an extra data augmentation step that renders each indicator conditionally independent of the others given the latent data. There is residual dependence between successive MCMC steps introduced by the latent data, but the paper by Ghosh and Clyde suggested that on balance mixing should be improved.

Regarding the overall compute time, the DA method decomposes the (potentially very large) model matrix one time, at the start of the algorithm. But it then works with independent scalar updates. The SSVS algorithm does not have the upfront cost, but it works with many small matrix decompositions each MCMC iteration. The DA algorithm is very likely to be faster in terms of time per iteration.

Finally, note that the two algorithms require slightly different priors. The DA algorithm requires a priori independence, while the SSVS algorithm can work with arbitrary conjugate priors.

References

George and McCulloch (1997), "Approaches to Bayesian Variable Selection", Statistica Sinica, 7, 339 -- 373. http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n26.pdf

Ghosh and Clyde (2011) "Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: A novel data augmentation approach", Journal of the American Statistical Association, 106 1041-1052. http://homepage.stat.uiowa.edu/~jghsh/ghosh_clyde_2011_jasa.pdf

See Also

SpikeSlabPrior, plot.lm.spike, summary.lm.spike, predict.lm.spike.

Examples

Run this code
n <- 100
  p <- 10
  ngood <- 3
  niter <- 1000
  sigma <- .8

  x <- cbind(1, matrix(rnorm(n * (p-1)), nrow=n))
  beta <- c(rnorm(ngood), rep(0, p - ngood))
  y <- rnorm(n, x %*% beta, sigma)
  x <- x[,-1]
  model <- lm.spike(y ~ x, niter=niter)
  plot.ts(model$beta)
  opar <-  par(ask=TRUE)
  on.exit(par(opar))
  hist(model$sigma)  ## should be near 8
  plot(model)
  summary(model)

Run the code above in your browser using DataLab