stan_glm: Bayesian generalized linear models via Stan

Description

Generalized linear modeling with optional prior distributions for the coefficients, intercept, and nuisance parameter.

Usage

stan_glm(formula, family = gaussian(), data, weights, subset,
  na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE,
  contrasts = NULL, ..., prior = normal(), prior_intercept = normal(),
  prior_ops = prior_options(), prior_PD = FALSE, algorithm = c("sampling",
  "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE)
stan_glm.nb(..., link = "log")
stan_glm.fit(x, y, weights = rep(1, NROW(x)), offset = rep(0, NROW(x)),
  family = gaussian(), ..., prior = normal(), prior_intercept = normal(),
  prior_ops = prior_options(), group = list(), prior_PD = FALSE,
  algorithm = c("sampling", "optimizing", "meanfield", "fullrank"),
  adapt_delta = NULL, QR = FALSE)

Arguments

formula, data, subset

Same as glm.

family

Same as glm, except negative binomial GLMs are also possible using the neg_binomial_2 family object.

na.action, contrasts

Same as glm, but rarely specified.

model, offset, weights

Same as glm.

x, y

In stan_glm, stan_glm.nb, logical scalars indicating whether to return the design matrix and response vector. In stan_glm.fit, a design matrix and response vector.

...

Further arguments passed to the function in the rstan package (sampling, vb, or optimizing

prior

Prior for coefficients. See priors for details. Set prior to NULL to omit a prior, i.e., use an (improper) uniform prior.

prior_intercept

Prior for intercept. See priors for details. Set prior_intercept to NULL to omit a prior, i.e., use an (improper) uniform prior. (Note: the prior distribution for

prior_ops

Additional options related to prior distributions. Can be NULL to omit a prior on the dispersion and see prior_options otherwise.

prior_PD

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

algorithm

Character string (possibly abbreviated) indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "optimizing" for optimization, "meanfield" for variational inference with independent norm

adapt_delta

Only relevant if algorithm="sampling". See adapt_delta for details.

A logical scalar (defaulting to FALSE) but if TRUE applies a scaled qr decomposition to the design matrix, $X = Q^\ast R^\ast$, where $Q^\ast = Q \sqrt{n-1}$ and $R^\ast = \frac{1}{\s

link

For stan_glm.nb only, the link function to use. See neg_binomial_2.

group

A list, possibly of length zero (the default), but otherwise having the structure of that produced by mkReTrms to indicate the group-specific part of the model. In addition, this list must have element

Value

A stanreg object is returned for stan_glm, stan_glm.nb.
A stanfit object (or a slightly modified stanfit object) is returned if stan_glm.fit is called directly.

Details

The stan_glm function is similar in syntax to glm but rather than performing maximum likelihood estimation of generalized linear models, full Bayesian estimation is performed (if algorithm is "sampling") via MCMC. The Bayesian model adds independent priors on the coefficients of the GLM. The stan_glm function calls the workhorse stan_glm.fit function, but it is also possible to call the latter directly. The stan_glm.nb function, which takes the extra argument link, is a simple wrapper for stan_glm with

family =
  neg_binomial_2(link)

References

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK. (Ch. 3-6)

Examples

Run this code

if (!grepl("^sparc",  R.version$platform)) {
### Linear regression
fit <- stan_glm(mpg / 10 ~ ., data = mtcars, QR = TRUE,
                algorithm = "fullrank") # for speed only
plot(fit, ci_level = 0.5)
plot(fit, ci_level = 0.5, pars = "beta")

### Logistic regression
data(lalonde, package = "arm")
dat <- within(lalonde, {
 re74_1k <- re74 / 1000
 re75_1k <- re75 / 1000
})
t7 <- student_t(df = 7)
fmla <- treat ~ re74_1k + re75_1k + educ + black + hisp + 
               married + nodegr + u74 + u75
fit2 <- stan_glm(fmla, data = dat, family = binomial(link="logit"), 
                 prior = t7, prior_intercept = t7, 
                 algorithm = "fullrank") # for speed only
plot(fit2, pars = c("black", "hisp", "nodegr", "u74", "u75"), 
     ci_level = 0.67, outer_level = 1, show_density = TRUE)
pp_check(fit2, check = "resid")
pp_check(fit2, check = "test", test = "mean")
}
### Poisson regression (example from help("glm")) 
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
fit3 <- stan_glm(counts ~ outcome + treatment, family = poisson(link="log"),
                 prior = normal(0, 1), prior_intercept = normal(0, 5))
plot(fit3, fill_color = "skyblue4", est_color = "maroon")

### Gamma regression (example from help("glm"))
clotting <- data.frame(log_u = log(c(5,10,15,20,30,40,60,80,100)),
                       lot1 = c(118,58,42,35,27,25,21,19,18),
                       lot2 = c(69,35,26,21,18,16,13,12,12))
fit4 <- stan_glm(lot1 ~ log_u, data = clotting, family = Gamma) 
print(fit4, digits = 2)                 
fit5 <- update(fit4, formula = lot2 ~ log_u)

Run the code above in your browser using DataLab