stan_gamm4: Bayesian generalized linear additive models with group-specific terms via Stan

Description

Bayesian inference for GAMMs with flexible priors.

Usage

stan_gamm4(formula, random = NULL, family = gaussian(), data = list(), weights = NULL, subset = NULL, na.action, knots = NULL, drop.unused.levels = TRUE, ..., prior = normal(), prior_intercept = normal(), prior_ops = prior_options(), prior_covariance = decov(), prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE, sparse = FALSE)
plot_nonlinear(x, smooths, prob = 0.9, facet_args = list(), ..., alpha = 1, size = 0.75)

Arguments

formula, random, family, data, knots, drop.unused.levels

Same as for gamm4.

subset, weights, na.action

Same as glm, but rarely specified.

...

Further arguments passed to sampling (e.g. iter, chains, cores, etc.) or to vb (if algorithm is "meanfield" or "fullrank").

prior

The prior distribution for the regression coefficients. prior can be a call to normal, student_t, cauchy, hs or hs_plus. See priors for details. To omit a prior ---i.e., to use a flat (improper) uniform prior--- prior can be set to NULL, although this is rarely a good idea. (Note: unless QR=TRUE, if the scaled argument to prior_options is left at its default and recommended value of TRUE, then the scale(s) of prior may be modified internally based on the scales of the predictors, as in the arm package. See priors for details on the rescaling and prior_summary for a summary of the priors used for a particular model.)

prior_intercept

The prior distribution for the intercept. prior_intercept can be a call to normal, student_t or cauchy. See priors for details. To to omit a prior ---i.e., to use a flat (improper) uniform prior--- set prior_intercept to NULL. (Note: if a dense representation of the design matrix is utilized ---i.e., if the sparse argument is left at its default value of FALSE--- then the prior distribution for the intercept is set so it applies to the value when all predictors are centered.)

prior_ops

Additional options related to prior distributions. Can be NULL to omit a prior on the dispersion and see prior_options otherwise.

prior_covariance

Cannot be NULL; see decov for more information about the default arguments.

prior_PD

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

algorithm

A string (possibly abbreviated) indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "optimizing" for optimization, "meanfield" for variational inference with independent normal distributions, or "fullrank" for variational inference with a multivariate normal distribution. See rstanarm-package for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.

adapt_delta

Only relevant if algorithm="sampling". See adapt_delta for details.

A logical scalar (defaulting to FALSE) but if TRUE applies a scaled qr decomposition to the design matrix, $X = Q* R*$, where $Q* = Q (n-1)^0.5$ and $R* = (n-1)^(-0.5) R$. The coefficients relative to $Q*$ are obtained and then premultiplied by the inverse of $R*$ to obtain coefficients relative to the original predictors, $X$. These transformations do not change the likelihood of the data but are recommended for computational reasons when there are multiple predictors. However, because the coefficients relative to $Q*$ are not very interpretable it is hard to specify an informative prior. Setting QR=TRUE is therefore only recommended if you do not have an informative prior for the regression coefficients.

sparse

A logical scalar (defaulting to FALSE) indicating whether to use a sparse representation of the design (X) matrix. Setting this to TRUE will likely be twice as slow, even if the design matrix has a considerable number of zeros, but it may allow the model to be estimated when the computer has too little RAM to utilize a dense design matrix. If TRUE, the the design matrix is not centered (since that would destroy the sparsity) and it is not possible to specify both QR = TRUE and sparse = TRUE.

An object produced by stan_gamm4.

smooths

An optional character vector specifying a subset of the smooth functions specified in the call to stan_gamm4. The default is include all smooth terms.

prob

A scalar between 0 and 1 governing the width of the uncertainty interval.

facet_args

An optional named list of arguments passed to facet_wrap (other than the facets argument).

alpha, size

Passed to geom_ribbon.

Value

A stanreg object is returned for stan_gamm4.plot_nonlinear returns a ggplot object.

Details

The stan_gamm4 function is similar in syntax to gamm4 in the gamm4 package, which accepts a syntax that is similar to (but not quite as extensive as) that for gamm in the mgcv package and converts it internally into the syntax accepted by glmer in the lme4 package. But rather than performing (restricted) maximum likelihood estimation, the stan_gamm4 function utilizes MCMC to perform Bayesian estimation. The Bayesian model adds independent priors on the common regression coefficients (in the same way as stan_glm) and priors on the terms of a decomposition of the covariance matrices of the group-specific parameters, including the smooths. Estimating these models via MCMC avoids the optimization issues that often crop up with GAMMs and provides better estimates for the uncertainty in the parameter estimates. See gamm4 for more information about the model specicification and priors for more information about the priors. The plot_nonlinear function creates a ggplot object with one facet for each smooth function specified in the call to stan_gamm4. A subset of the smooth functions can be specified using the smooths argument. The plot is conceptually similar to plot.gam except the outer lines here demark the edges of posterior uncertainty intervals (credible intervals) rather than confidence intervals and the inner line is the posterior median of the function rather than the function implied by a point estimate. To change the colors used in the plot see color_scheme_set.

References

Crainiceanu, C., Ruppert D., and Wand, M. (2005). Bayesian Analysis for Penalized Spline Regression Using WinBUGS. Journal of Statistical Software. 14(14), 1--22. https://www.jstatsoft.org/article/view/v014i14

Examples

Run this code

# from example(gamm4, package = "gamm4"), prefixing gamm4() call with stan_

dat <- mgcv::gamSim(1, n = 400, scale = 2) ## simulate 4 term additive truth
## Now add 20 level random effect `fac'...
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = ~ (1 | fac), 
                 QR = TRUE, chains = 1)
print(br)
plot_nonlinear(br)
plot_nonlinear(br, smooths = "s(x0)", alpha = 2/3)

Run the code above in your browser using DataLab