spikeSlabGAM: Generate posterior samples for a GAMM with spike-and-slab priors

Description

This function fits structured additive regression models with spike-and-slab priors via MCMC. The spike-and-slab prior results in an SSVS-like term selection that can be used to aid model choice, e.g. to decide between linear and smooth modelling of numeric covariates or which interaction effects are relevant. Model terms can be factors (fct), linear/polynomial terms (lin), uni- or bivariate splines (sm, srf), random intercepts (rnd) or Markov random fields (mrf) and their interactions, i.e. an interaction between two smooth terms yields an effect surface, an interaction between a linear term and a random intercept yields random slopes, an interaction between a linear term and a smooth term yields a varying coefficient term etc.

Usage

spikeSlabGAM(formula, data, ..., family = "gaussian",
    hyperparameters = list(), model = list(),
    mcmc = list(), start = list())

Arguments

formula

the model formula (see details below and ssGAMDesign).

data

the data containing the variables in the function

...

additional arguments for ssGAMDesign

family

(character) the family of the response, defaults to normal/Gaussian response, "poisson" and "binomial" are implemented as well.

hyperparameters

A list of arguments specifying the prior settings. See spikeAndSlab.

model

A list of arguments describing the model structure. See spikeAndSlab. User-supplied groupIndicators and H entries will be overwritten by

mcmc

A list of arguments specifying MCMC sampler options. See spikeAndSlab.

start

A list of starting values for the MCMC sampler. See spikeAndSlab. Use start=list(seed=) to set the RNG seed for reproducible results.

Value

an object of class spikeSlabGAM with methods summary.spikeSlabGAM, predict.spikeSlabGAM, and plot.spikeSlabGAM.

Details

Implemented types of terms: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] Terms in the formula that are not in the list of specials (i.e. the list of term types above) are automatically assigned an appropriate special wrapper, i.e. numerical covariates x are treated as

lin(x) +
  sm(x)

, factors f (and numerical covariates with very few distinct values, see ssGAMDesign) are treated as fct(f). Valid model formulas have to satisfy the following conditions:

All variables that are involved in interactions have to be present as main effects as well, i.e. {y ~ x1 + x1:x2

will produce an error because the main effect of x2 is missing. Interactions between unpenalized terms not under selection (i.e. terms of type u) and penalized terms are not allowed, i.e. y ~ u(x1)*x2 will produce an error. If main effects are specified as special terms, the interactions involving them have to be given as special terms as well, i.e. y ~ lin(x1) + lin(x2) + x1:x2 will produce an error.

The default prior settings for Gaussian data work best for a response with unit variance. If your data is scaled very differently, either rescale the response (recommended) or adjust the hyperparameters accordingly.

Sampling of the chains is done in parallel using package multicore if available. If not, a socket cluster set up with snow is used where available. Use options(cores=foo) to set the (maximal) number of processes spawned by the parallelization. If options()$cores is unspecified, snow uses 8.

## examples not run due to stringent time constraints on CRAN checks. ## full examples below should take about 2-4 minutes.

set.seed(91179) n <- 400 d <- data.frame(f1 = sample(gl(3, n/3)), f2 = sample(gl(4, n/4)), x1 = runif(n), x2 = runif(n), x3 = runif(n)) # true model: # - interaction between f1 and x1 # - smooth interaction between x1 and x2, # - x3 and f2 are noise variables without influence on y nf1 <- as.numeric(d$f1) d$f <- with(d, 5 * (nf1 + 2 * sin(4 * pi * (x1 - 0.2) * (x2 - 0.7)) - nf1 * x1)) d$y <- with(d, scale(f + rnorm(n))) d$yp <- with(d, rpois(n, exp(f/10)))

# fit & display the model: m1 <- spikeSlabGAM(y ~ x1 * f1 + f1 * f2 + x3 * f1 + x1 * x2, data = d) summary(m1)

# visualize estimates: plot(m1) plot(m1, cumulative = FALSE) (plotTerm("fct(f1):fct(f2)", m1, aggregate = median)) print(p <- plotTerm("sm(x1):sm(x2)", m1, quantiles = c(0.25, 0.75), cumulative = FALSE))

# change MCMC settings and priors: mcmc <- list(nChains = 3, burnin = 100, chainLength = 1000, thin = 3, reduceRet = TRUE) hyper <- list(gamma = c(v0 = 5e-04), tau2 = c(10, 30), w = c(2, 1))

# complicated formula example, poisson response: m2 <- spikeSlabGAM(yp ~ x1 * (x2 + f1) + (x2 + x3 + f2)^2 - sm(x2):sm(x3), data = d, family = "poisson", mcmc = mcmc, hyperparameters = hyper) summary(m2) plot(m2)

# quick&dirty convergence diagnostics: print(b <- ssGAM2Bugs(m2)) plot(b)

[object Object] Fabian Scheipl (2011). spikeSlabGAM: Bayesian Variable Selection, Model Choice and Regularization for Generalized Additive Mixed Models in R. Journal of Statistical Software, 43(14), 1--24.

Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib (2012). Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models. Journal of the American Statistical Association, 107(500), 1518--1532.

ssGAMDesign for details on model specification, spikeAndSlab for more details on the MCMC sampler and prior specification, and ssGAM2Bugs for MCMC diagnostics. Check out the vignette for theoretical background and code examples.