stan_glm: Bayesian generalized linear models via Stan

Description

Generalized linear modeling with optional prior distributions for the coefficients, intercept, and auxiliary parameters.

Usage

stan_glm(formula, family = gaussian(), data, weights, subset, na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE, sparse = FALSE)
stan_glm.nb(formula, data, weights, subset, na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, link = "log", ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE)
stan_glm.fit(x, y, weights = rep(1, NROW(x)), offset = rep(0, NROW(x)), family = gaussian(), ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_ops = NULL, group = list(), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE, sparse = FALSE)

Arguments

formula, data, subset

Same as glm.

family

Same as glm, except negative binomial GLMs are also possible using the neg_binomial_2 family object.

na.action, contrasts

Same as glm, but rarely specified.

model, offset, weights

Same as glm.

x, y

In stan_glm, stan_glm.nb, logical scalars indicating whether to return the design matrix and response vector. In stan_glm.fit, a design matrix and response vector.

...

Further arguments passed to the function in the rstan package (sampling, vb, or optimizing), corresponding to the estimation method named by algorithm. For example, if algorithm is "sampling" it is possibly to specify iter, chains, cores, refresh, etc.

prior

The prior distribution for the regression coefficients. prior should be a call to one of the various functions provided by rstanarm for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":

Family

Functions

Student t family

normal, student_t, cauchy

Hierarchical shrinkage family

hs, hs_plus

Laplace family

laplace, lasso

Product normal family

product_normal

See the priors help page for details on the families and how to specify the arguments for all of the functions in the table above. To omit a prior ---i.e., to use a flat (improper) uniform prior--- prior can be set to NULL, although this is rarely a good idea.

Note: Unless QR=TRUE, if prior is from the Student t family or Laplace family, and if the autoscale argument to the function used to specify the prior (e.g. normal) is left at its default and recommended value of TRUE, then the default or user-specified prior scale(s) may be adjusted internally based on the scales of the predictors. See the priors help page for details on the rescaling and the prior_summary function for a summary of the priors used for a particular model.

prior_intercept

The prior distribution for the intercept. prior_intercept can be a call to normal, student_t or cauchy. See the priors help page for details on these functions. To omit a prior on the intercept ---i.e., to use a flat (improper) uniform prior--- prior_intercept can be set to NULL.

Note: If using a dense representation of the design matrix ---i.e., if the sparse argument is left at its default value of FALSE--- then the prior distribution for the intercept is set so it applies to the value when all predictors are centered.

prior_aux

The prior distribution for the "auxiliary" parameter (if applicable). The "auxiliary" parameter refers to a different parameter depending on the family. For Gaussian models prior_aux controls "sigma", the error standard deviation. For negative binomial models prior_aux controls "reciprocal_dispersion", which is similar to the "size" parameter of rnbinom: smaller values of "reciprocal_dispersion" correspond to greater dispersion. For gamma models prior_aux sets the prior on to the "shape" parameter (see e.g., rgamma), and for inverse-Gaussian models it is the so-called "lambda" parameter (which is essentially the reciprocal of a scale parameter). Binomial and Poisson models do not have auxiliary parameters.

prior_aux can be a call to exponential to use an exponential distribution, or normal, student_t or cauchy, which results in a half-normal, half-t, or half-Cauchy prior. See priors for details on these functions. To omit a prior ---i.e., to use a flat (improper) uniform prior--- set prior_aux to NULL.

prior_PD

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

algorithm

A string (possibly abbreviated) indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "optimizing" for optimization, "meanfield" for variational inference with independent normal distributions, or "fullrank" for variational inference with a multivariate normal distribution. See rstanarm-package for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.

adapt_delta

Only relevant if algorithm="sampling". See adapt_delta for details.

A logical scalar (defaulting to FALSE) but if TRUE applies a scaled qr decomposition to the design matrix, $X = Q* R*$, where $Q* = Q (n-1)^0.5$ and $R* = (n-1)^(-0.5) R$. The coefficients relative to $Q*$ are obtained and then premultiplied by the inverse of $R*$ to obtain coefficients relative to the original predictors, $X$. These transformations do not change the likelihood of the data but are recommended for computational reasons when there are multiple predictors. However, because when QR is TRUE the prior argument applies to the coefficients relative to $Q*$ (and those are not very interpretable) it is hard to specify an informative prior. Setting QR=TRUE is therefore only recommended if you do not have an informative prior for the regression coefficients.

sparse

A logical scalar (defaulting to FALSE) indicating whether to use a sparse representation of the design (X) matrix. Setting this to TRUE will likely be twice as slow, even if the design matrix has a considerable number of zeros, but it may allow the model to be estimated when the computer has too little RAM to utilize a dense design matrix. If TRUE, the the design matrix is not centered (since that would destroy the sparsity) and it is not possible to specify both QR = TRUE and sparse = TRUE.

link

For stan_glm.nb only, the link function to use. See neg_binomial_2.

prior_ops

Deprecated. See rstanarm-deprecated for details.

group

A list, possibly of length zero (the default), but otherwise having the structure of that produced by mkReTrms to indicate the group-specific part of the model. In addition, this list must have elements for the regularization, concentration shape, and scale components of a decov prior for the covariance matrices among the group-specific coefficients.

Value

A stanreg object is returned for stan_glm, stan_glm.nb.A stanfit object (or a slightly modified stanfit object) is returned if stan_glm.fit is called directly.

Details

The stan_glm function is similar in syntax to glm but rather than performing maximum likelihood estimation of generalized linear models, full Bayesian estimation is performed (if algorithm is "sampling") via MCMC. The Bayesian model adds priors (independent by default) on the coefficients of the GLM. The stan_glm function calls the workhorse stan_glm.fit function, but it is also possible to call the latter directly. The stan_glm.nb function, which takes the extra argument link, is a wrapper for stan_glm with

family = 
  neg_binomial_2(link)

References

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK. (Ch. 3-6)

Examples

Run this code

if (!grepl("^sparc",  R.version$platform)) {
### Linear regression
fit <- stan_glm(mpg / 10 ~ ., data = mtcars, QR = TRUE,
                algorithm = "fullrank") # only to make example fast enough
plot(fit, prob = 0.5)
plot(fit, prob = 0.5, pars = "beta")
}

### Logistic regression
head(wells)
wells$dist100 <- wells$dist / 100
fit2 <- stan_glm(
  switch ~ dist100 + arsenic, 
  data = wells, 
  family = binomial(link = "logit"), 
  prior = student_t(df = 7, location = 0, scale = 2.5), 
  prior_intercept = normal(0, 10),
  chains = 1, iter = 250 # for speed
)
print(fit2)
prior_summary(fit2)

plot(fit2, plotfun = "areas", prob = 0.9, # ?bayesplot::mcmc_areas
     pars = c("(Intercept)", "arsenic"))
pp_check(fit2, plotfun = "error_binned")  # ?bayesplot::ppc_error_binned


### Poisson regression (example from help("glm")) 
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
fit3 <- stan_glm(counts ~ outcome + treatment, family = poisson(link="log"),
                 prior = normal(0, 1), prior_intercept = normal(0, 5),
                 chains = 1, iter = 250) # for speed
print(fit3)

bayesplot::color_scheme_set("green")
plot(fit3)
plot(fit3, regex_pars = c("outcome", "treatment"))
plot(fit3, plotfun = "combo", regex_pars = "treatment") # ?bayesplot::mcmc_combo

### Gamma regression (example from help("glm"))
clotting <- data.frame(log_u = log(c(5,10,15,20,30,40,60,80,100)),
                       lot1 = c(118,58,42,35,27,25,21,19,18),
                       lot2 = c(69,35,26,21,18,16,13,12,12))
fit4 <- stan_glm(lot1 ~ log_u, data = clotting, family = Gamma,
                 chains = 1, iter = 250) # for speed 
print(fit4, digits = 2)
fit5 <- update(fit4, formula = lot2 ~ log_u)

### Negative binomial regression
fit6 <- stan_glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = MASS::quine, 
                    link = "log", prior_aux = exponential(1/2),
                    QR = TRUE, chains = 1, iter = 250) # for speed

bayesplot::color_scheme_set("brightblue")
plot(fit6)
pp_check(fit6, plotfun = "hist", nreps = 5)

# 80% interval of estimated reciprocal_dispersion parameter
posterior_interval(fit6, pars = "reciprocal_dispersion", prob = 0.8)
plot(fit6, "areas", pars = "reciprocal_dispersion", prob = 0.8)

Run the code above in your browser using DataLab