lm.spike: Spike and slab regression

Description

MCMC algorithm for linear regression models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

The model admits either Gaussian or student T errors; the latter are useful in the presence of outliers.

Usage

lm.spike(formula,
         niter,
         data,
         subset,
         prior = NULL,
         error.distribution = c("gaussian", "student"),
         contrasts = NULL,
         drop.unused.levels = TRUE,
         bma.method = c("SSVS", "ODA"),
         oda.options = list(
             fallback.probability = 0.0,
             eigenvalue.fudge.factor = 0.01),
         ping = niter / 10,
         seed = NULL,
         ...)

Arguments

formula

formula for the maximal model (with all variables included), this is parsed the same way as a call to lm.

niter

The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'lm.spike' is called..

subset

an optional vector specifying a subset of observations to be used in the fitting process.

prior

An optional list returned by SpikeSlabPrior. If prior is missing then a default prior will be used. See SpikeSlabPrior.

error.distribution

Specify either Gaussian or Student T errors. If the error distribution is student then the prior must be a StudentSpikeSlabPrior.

contrasts

An optional list. See the contrasts.arg argument of model.matrix.default.

drop.unused.levels

Logical indicating whether unobserved factor levels should be dropped from the model.

bma.method

The MCMC method to use. SSVS is the stochastic search variable selection algorithm from George and McCulloch (1998). ODA is the orthogonal data augmentation method from Clyde and Ghosh (2011).

oda.options

If bma.method == "ODA" then these are some options for fine tuning the ODA algorithm.

fallback.probability: Each MCMC iteration will use SSVS instead of ODA with this probability. In cases where the latent data have high leverage, ODA mixing can suffer. Mixing in a few SSVS steps can help keep an errant algorithm on track.
eigenvalue.fudge.factor: The latent X's will be chosen so that the complete data X'X matrix (after scaling) is a constant diagonal matrix equal to the largest eigenvalue of the observed (scaled) X'X times (1 + eigenvalue.fudge.factor). This should be a small positive number.

ping

The frequency with which to print status update messages to the screen. For example, if ping == 10 then an update will be printed every 10 MCMC iterations.

seed

An integer to use as the random seed for the underlying C++ code. If NULL then the seed will be set using the clock.

…

Extra arguments to be passed to SpikeSlabPrior (if method == "SSVS") or IndependentSpikeSlabPrior (if method == "DA").

Value

Returns an object of class lm.spike, which is a list with the following elements

beta

A niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.

sigma

A vector of length niter containing the MCMC draws of the residual standard deviation parameter.

prior

The prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Details

Both methods ("DA" and "SSVS") draw each variable inclusion indicator given all others, in a Gibbs sampler. The DA method includes an extra data augmentation step that renders each indicator conditionally independent of the others given the latent data. There is residual dependence between successive MCMC steps introduced by the latent data, but the paper by Ghosh and Clyde suggested that on balance mixing should be improved.

Regarding the overall compute time, the DA method decomposes the (potentially very large) model matrix one time, at the start of the algorithm. But it then works with independent scalar updates. The SSVS algorithm does not have the upfront cost, but it works with many small matrix decompositions each MCMC iteration. The DA algorithm is very likely to be faster in terms of time per iteration.

Finally, note that the two algorithms require slightly different priors. The DA algorithm requires a priori independence, while the SSVS algorithm can work with arbitrary conjugate priors.

References

George and McCulloch (1997), "Approaches to Bayesian Variable Selection", Statistica Sinica, 7, 339 -- 373. http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n26.pdf

Ghosh and Clyde (2011) "Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: A novel data augmentation approach", Journal of the American Statistical Association, 106 1041-1052. http://homepage.stat.uiowa.edu/~jghsh/ghosh_clyde_2011_jasa.pdf

Examples

Run this code

# NOT RUN {
  n <- 100
  p <- 10
  ngood <- 3
  niter <- 1000
  sigma <- .8

  x <- cbind(1, matrix(rnorm(n * (p-1)), nrow=n))
  beta <- c(rnorm(ngood), rep(0, p - ngood))
  y <- rnorm(n, x %*% beta, sigma)
  x <- x[,-1]
  model <- lm.spike(y ~ x, niter=niter)
  plot.ts(model$beta)
  hist(model$sigma)  ## should be near 8
  plot(model)
  summary(model)
  plot(model, "residuals")

  ## Now replace the first observation with a big outlier.
  y[1] <- 50
  model <- lm.spike(y ~ x, niter = niter)
  model2 <- lm.spike(y ~ x, niter = niter, error.distribution = "student")
  pred <- predict(model, newdata = x)
  pred2 <- predict(model2, newdata = x)

  ## Maximize the plot window before making these box plots.  They show
  ## the posterior predictive distribution of all 100 data points, so
  ## make sure your screen is 100 boxes wide!
  par(mfrow = c(2,1))
  BoxplotTrue(t(pred), truth = y, ylim = range(pred), pch = ".",
     main = "Posterior predictive distribution assuming Gaussian errors.")
  BoxplotTrue(t(pred2), truth = y, ylim  = range(pred), pch = ",",
     main = "Posterior predictive distribution assuming Student errors.")

  ## The posterior predictive distributions are much tighter in the
  ## student case than in the Gaussian case, even though the student
  ## model has heavier tails, because the "sigma" parameter is smaller.
  par(mfrow = c(1,1))
  CompareDensities(list(gaussian = model$sigma, student = model2$sigma),
                        xlab = "sigma")
# }

Run the code above in your browser using DataLab