# stan_glm

##### Bayesian generalized linear models via Stan

Generalized linear modeling with optional prior distributions for the coefficients, intercept, and auxiliary parameters.

##### Usage

```
stan_glm(formula, family = gaussian(), data, weights, subset, na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE, sparse = FALSE)
stan_glm.nb(formula, data, weights, subset, na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, link = "log", ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE)
stan_glm.fit(x, y, weights = rep(1, NROW(x)), offset = rep(0, NROW(x)), family = gaussian(), ..., prior = normal(), prior_intercept = normal(), prior_aux = cauchy(0, 5), prior_ops = NULL, group = list(), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE, sparse = FALSE)
```

##### Arguments

- formula, data, subset
- Same as
`glm`

. - family
- Same as
`glm`

, except negative binomial GLMs are also possible using the`neg_binomial_2`

family object. - na.action, contrasts
- Same as
`glm`

, but rarely specified. - model, offset, weights
- Same as
`glm`

. - x, y
- In
`stan_glm, stan_glm.nb`

, logical scalars indicating whether to return the design matrix and response vector. In`stan_glm.fit`

, a design matrix and response vector. - ...
- Further arguments passed to the function in the rstan
package (
`sampling`

,`vb`

, or`optimizing`

), corresponding to the estimation method named by`algorithm`

. For example, if`algorithm`

is`"sampling"`

it is possibly to specify`iter`

,`chains`

,`cores`

,`refresh`

, etc. - prior
- The prior distribution for the regression coefficients.
`prior`

should be a call to one of the various functions provided by rstanarm for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":**Family****Functions***Student t family*`normal`

,`student_t`

,`cauchy`

*Hierarchical shrinkage family*`hs`

,`hs_plus`

*Laplace family*`laplace`

,`lasso`

*Product normal family*`product_normal`

See the priors help page for details on the families and how to specify the arguments for all of the functions in the table above. To omit a prior ---i.e., to use a flat (improper) uniform prior---

`prior`

can be set to`NULL`

, although this is rarely a good idea.**Note:**Unless`QR=TRUE`

, if`prior`

is from the Student t family or Laplace family, and if the`autoscale`

argument to the function used to specify the prior (e.g.`normal`

) is left at its default and recommended value of`TRUE`

, then the default or user-specified prior scale(s) may be adjusted internally based on the scales of the predictors. See the priors help page for details on the rescaling and the`prior_summary`

function for a summary of the priors used for a particular model. - prior_intercept
- The prior distribution for the intercept.
`prior_intercept`

can be a call to`normal`

,`student_t`

or`cauchy`

. See the priors help page for details on these functions. To omit a prior on the intercept ---i.e., to use a flat (improper) uniform prior---`prior_intercept`

can be set to`NULL`

.**Note:**If using a dense representation of the design matrix ---i.e., if the`sparse`

argument is left at its default value of`FALSE`

--- then the prior distribution for the intercept is set so it applies to the value when all predictors are centered. - prior_aux
- The prior distribution for the "auxiliary" parameter (if
applicable). The "auxiliary" parameter refers to a different parameter
depending on the
`family`

. For Gaussian models`prior_aux`

controls`"sigma"`

, the error standard deviation. For negative binomial models`prior_aux`

controls`"reciprocal_dispersion"`

, which is similar to the`"size"`

parameter of`rnbinom`

: smaller values of`"reciprocal_dispersion"`

correspond to greater dispersion. For gamma models`prior_aux`

sets the prior on to the`"shape"`

parameter (see e.g.,`rgamma`

), and for inverse-Gaussian models it is the so-called`"lambda"`

parameter (which is essentially the reciprocal of a scale parameter). Binomial and Poisson models do not have auxiliary parameters.`prior_aux`

can be a call to`exponential`

to use an exponential distribution, or`normal`

,`student_t`

or`cauchy`

, which results in a half-normal, half-t, or half-Cauchy prior. See`priors`

for details on these functions. To omit a prior ---i.e., to use a flat (improper) uniform prior--- set`prior_aux`

to`NULL`

. - prior_PD
- A logical scalar (defaulting to
`FALSE`

) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome. - algorithm
- A string (possibly abbreviated) indicating the
estimation approach to use. Can be
`"sampling"`

for MCMC (the default),`"optimizing"`

for optimization,`"meanfield"`

for variational inference with independent normal distributions, or`"fullrank"`

for variational inference with a multivariate normal distribution. See`rstanarm-package`

for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms. - adapt_delta
- Only relevant if
`algorithm="sampling"`

. See`adapt_delta`

for details. - QR
- A logical scalar (defaulting to
`FALSE`

) but if`TRUE`

applies a scaled`qr`

decomposition to the design matrix, $X = Q* R*$, where $Q* = Q (n-1)^0.5$ and $R* = (n-1)^(-0.5) R$. The coefficients relative to $Q*$ are obtained and then premultiplied by the inverse of $R*$ to obtain coefficients relative to the original predictors, $X$. These transformations do not change the likelihood of the data but are recommended for computational reasons when there are multiple predictors. However, because when`QR`

is`TRUE`

the`prior`

argument applies to the coefficients relative to $Q*$ (and those are not very interpretable) it is hard to specify an informative prior. Setting`QR=TRUE`

is therefore only recommended if you do not have an informative prior for the regression coefficients. - sparse
- A logical scalar (defaulting to
`FALSE`

) indicating whether to use a sparse representation of the design (X) matrix. Setting this to`TRUE`

will likely be twice as slow, even if the design matrix has a considerable number of zeros, but it may allow the model to be estimated when the computer has too little RAM to utilize a dense design matrix. If`TRUE`

, the the design matrix is not centered (since that would destroy the sparsity) and it is not possible to specify both`QR = TRUE`

and`sparse = TRUE`

. - link
- For
`stan_glm.nb`

only, the link function to use. See`neg_binomial_2`

. - prior_ops
- Deprecated. See rstanarm-deprecated for details.
- group
- A list, possibly of length zero (the default), but otherwise
having the structure of that produced by
`mkReTrms`

to indicate the group-specific part of the model. In addition, this list must have elements for the`regularization`

,`concentration`

`shape`

, and`scale`

components of a`decov`

prior for the covariance matrices among the group-specific coefficients.

##### Details

The `stan_glm`

function is similar in syntax to
`glm`

but rather than performing maximum likelihood
estimation of generalized linear models, full Bayesian estimation is
performed (if `algorithm`

is `"sampling"`

) via MCMC. The Bayesian
model adds priors (independent by default) on the coefficients of the GLM.
The `stan_glm`

function calls the workhorse `stan_glm.fit`

function, but it is also possible to call the latter directly.
The `stan_glm.nb`

function, which takes the extra argument
`link`

, is a wrapper for `stan_glm`

with ```
family =
neg_binomial_2(link)
```

.

##### Value

##### References

Gelman, A. and Hill, J. (2007). *Data Analysis Using
Regression and Multilevel/Hierarchical Models.* Cambridge University Press,
Cambridge, UK. (Ch. 3-6)

##### See Also

`stanreg-methods`

and
`glm`

.

The various vignettes for `stan_glm`

.

##### Examples

```
if (!grepl("^sparc", R.version$platform)) {
### Linear regression
fit <- stan_glm(mpg / 10 ~ ., data = mtcars, QR = TRUE,
algorithm = "fullrank") # only to make example fast enough
plot(fit, prob = 0.5)
plot(fit, prob = 0.5, pars = "beta")
}
### Logistic regression
head(wells)
wells$dist100 <- wells$dist / 100
fit2 <- stan_glm(
switch ~ dist100 + arsenic,
data = wells,
family = binomial(link = "logit"),
prior = student_t(df = 7, location = 0, scale = 2.5),
prior_intercept = normal(0, 10),
chains = 1, iter = 250 # for speed
)
print(fit2)
prior_summary(fit2)
plot(fit2, plotfun = "areas", prob = 0.9, # ?bayesplot::mcmc_areas
pars = c("(Intercept)", "arsenic"))
pp_check(fit2, plotfun = "error_binned") # ?bayesplot::ppc_error_binned
### Poisson regression (example from help("glm"))
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
fit3 <- stan_glm(counts ~ outcome + treatment, family = poisson(link="log"),
prior = normal(0, 1), prior_intercept = normal(0, 5),
chains = 1, iter = 250) # for speed
print(fit3)
bayesplot::color_scheme_set("green")
plot(fit3)
plot(fit3, regex_pars = c("outcome", "treatment"))
plot(fit3, plotfun = "combo", regex_pars = "treatment") # ?bayesplot::mcmc_combo
### Gamma regression (example from help("glm"))
clotting <- data.frame(log_u = log(c(5,10,15,20,30,40,60,80,100)),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
fit4 <- stan_glm(lot1 ~ log_u, data = clotting, family = Gamma,
chains = 1, iter = 250) # for speed
print(fit4, digits = 2)
fit5 <- update(fit4, formula = lot2 ~ log_u)
### Negative binomial regression
fit6 <- stan_glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = MASS::quine,
link = "log", prior_aux = exponential(1/2),
QR = TRUE, chains = 1, iter = 250) # for speed
bayesplot::color_scheme_set("brightblue")
plot(fit6)
pp_check(fit6, plotfun = "hist", nreps = 5)
# 80% interval of estimated reciprocal_dispersion parameter
posterior_interval(fit6, pars = "reciprocal_dispersion", prob = 0.8)
plot(fit6, "areas", pars = "reciprocal_dispersion", prob = 0.8)
```

*Documentation reproduced from package rstanarm, version 2.14.1, License: GPL (>= 3)*