# brm

##### Fit Bayesian Generalized Linear and Ordinal Mixed Models

Fit a Bayesian generalized linear or ordinal mixed model using Stan

##### Usage

```
brm(formula, data = NULL, family = c("gaussian", "identity"),
prior = list(), addition = NULL, autocor = NULL, partial = NULL,
threshold = "flexible", cov.ranef = NULL, ranef = TRUE,
predict = FALSE, fit = NA, n.chains = 2, n.iter = 2000,
n.warmup = 500, n.thin = 1, n.cluster = 1, inits = "random",
silent = FALSE, seed = 12345, save.model = NULL, engine = "stan", ...)
```

##### Arguments

- formula
- An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.
- data
- An optional data frame, list or environment (or object coercible by
`as.data.frame`

to a data frame) containing the variables in the model. If not found in data, the variables are taken from`environment(formula)`

, typically the e - family
- A vector of one or two character strings. The first string indicates the distribution of the dependent variable (the 'family'). Currently, the following families are supported:
`"gaussian"`

,`"student"`

,`"cauchy"`

, - prior
- A named list of character strings specifing the prior distributions of the parameters. Further information is provided under 'Details'.
- addition
- A named list of one sided formulas each containing additional information on the response variable. The following names are allowed:
`se`

for specifying standard errors for meta-analysis,`weights`

to fit weighted regression models, - autocor
- An optional
`cor.brms`

object describing the correlation structure within the response variable (i.e. the 'autocorrelation'). See the documentation of`cor.brms`

- partial
- A one sided formula of the form
`~partial.effects`

specifing the predictors that can vary between categories in non-cumulative ordinal models (i.e. in families`"cratio"`

,`"sratio"`

, or`"acat"`

). - threshold
- A character string indicating the type of thresholds (i.e. intercepts) used in an ordinal model.
`"flexible"`

provides the standard unstructured thresholds and`"equidistant"`

restricts the distance between consecutive thresholds to - cov.ranef
- A list of matrices that are proportional to the (within) covariance structure of the random effects.
The names of the matrices should correspond to columns in
`data`

that are used as grouping factors. All levels of the grouping factor should - ranef
- A flag to indicate if random effects for each level of the grouping factor(s) should be saved (default is
`TRUE`

). Set to`FALSE`

to save memory. The argument has no impact on the model fitting itself. - predict
- A flag to indicate if posterior predictives of the dependent variable should be generated.
- fit
- An instance of S3 class
`brmsfit`

derived from a previous fit; defaults to`NA`

. If`fit`

is of class`brmsfit`

, the compiled model associated with the fitted result is re-used and the arguments`formula<`

- n.chains
- Number of Markov chains (default: 2)
- n.iter
- Number of total iterations per chain (including burnin; default: 2000)
- n.warmup
- A positive integer specifying number of warmup (aka burnin) iterations. This also specifies the number of iterations used for stepsize adaptation,
so warmup samples should not be used for inference. The number of warmup should not be larger than
`n.`

- n.thin
- Thinning rate. Must be a positive integer. Set
`n.thin > 1`

to save memory and computation time if`n.iter`

is large. Default is 1, that is no thinning. - n.cluster
- Number of clusters to use to run parallel chains. Default is 1.
- inits
- A list with
`n.chains`

elements; each element of the list is itself a list of starting values for the model, or a function creating (possibly random) initial values. If inits is`"random"`

(the default), Stan will generate initial v - silent
- logical; If
`TRUE`

, most intermediate output from Stan is suppressed. - seed
- Positive integer. Used by
`set.seed`

to make results reproducable. - save.model
- Either
`NULL`

or a character string. In the latter case, the model code is saved in a file named after the string supplied in`save.model`

, which may also contain the full path where to save the file. If only a name is given, the f - engine
- A character string, either
`"stan"`

(the default) or`"jags"`

. Specifies which program should be used to fit the model. Note that`jags`

is currently implemented for testing purposes only, does not allow full functionalit - ...
- Further arguments to be passed to Stan.

##### Details

Fit a generalized linear mixed model, which incorporates both fixed-effects parameters and random effects in a linear predictor
via full bayesian inference using Stan. During warmup aka burnin phase, Stan may print out quite a few informational
messages that
`"the current Metropolis proposal is about the be rejected ..."`

.
These messages can be ignored in nearly all cases.
Use `silent = TRUE`

to stop these messages from being printed out.
**Formula syntax**
The `formula`

argument accepts formulas of the following syntax:
`response | addition ~ fixed + (random | group)`

Multiple grouping factors each with multiple random effects are possible. With the exception of `addition`

,
this is basically `lme4`

syntax.
The optional `addition`

term may contain multiple terms of the form `fun(variable)`

seperated by `|`

each providing
special information on the response variable. `fun`

can be replaced with either `se`

, `weights`

, `trials`

,
`cat`

, or `cens`

(their meanings are explained below). Using the `addition`

term in `formula`

is equivalent
to using argument `addition`

: Instead of writing `fun(variable)`

in `formula`

, we may use `addition = list(fun = ~variable)`

.
For families `gaussian`

, `student`

, and `cauchy`

it is possible to specify
standard errors of the observation, thus allowing to perform meta-analysis.
Suppose that the variable `yi`

contains the effect sizes from the studies and `sei`

the
corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted
using the formulae `yi | se(sei) ~ 1`

and `yi | se(sei) ~ 1 + (1|study)`

, respectively, where
`study`

is a variable uniquely identifying every study.
If desired, meta-regressen can be performed via `yi | se(sei) ~ 1 + mod1 + mod2 + (1|study)`

or
`yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study)`

, where
`mod1`

and `mod2`

represent moderator variables.
For all families, weighted regression may be performed using
`weights`

in the addition part. Suppose that variable `wei`

contains the weights and that `yi`

is the response variable.
Then, formula `yi | weights(wei) ~ predictors`

implements a weighted regression.
For family `binomial`

, addition may contain a variable indicating the number of trials
underlying each observation. In `lme4`

syntax, we may write for instance
`cbind(success, n - success)`

, which is equivalent
to `success | trials(n)`

in `brms`

syntax. If the number of trials
is constant across all observation (say `10`

), we may also write `success | trials(10)`

.
For family `categorical`

and all ordinal families, `addition`

may contain a term `cat(categories)`

to
specify the number categories for each observation, either with a variable name (e.g, `categories`

in this example) or a single number.
With the expection of `categorical`

and ordinal families, left and right censoring can be modeled through `yi | cens(censored) ~ predictors`

.
The censoring variable (named `censored`

in this example) should contain the values `'left'`

, `'none'`

, and `'right'`

(or equivalenty -1, 0, and 1) to indicate that the corresponding observation is left censored, not censored, or right censored.
Mutiple `addition`

terms may be specified at the same time, for instance
`formula = yi | se(sei) | cens(censored) ~ 1`

for a censored meta-analytic model, equivalent to
`formula = yi ~ 1`

and `addition = list(se = ~sei, cens = ~censored)`

when using argument `addition`

.
**Families and link functions**
Family `gaussian`

with `identity`

link leads to linear regression. Families `student`

, and `cauchy`

with `identity`

link leads to robust linear regression that is less influenced by outliers.
Families `poisson`

, `negbinomial`

, and `geometric`

with `log`

link lead to regression models for count data.
Family `binomial`

with `logit`

link leads to logistic regression and family `categorical`

to
multi-logistic regression when there are more than two possible outcomes.
Families `cumulative`

, `cratio`

('contiuation ratio'), `sratio`

('stopping ratio'),
and `acat`

('adjacent category') leads to ordinal regression. Families `gamma`

, `weibull`

, and `exponential`

can be used (among others) for survival regression when combined with the `log`

link.
In the following, we list all possible links for each family.
The families `gaussian`

, `student`

, and `cauchy`

accept the links (as names) `identity`

, `log`

, and `inverse`

;
families `poisson`

, `negbinomial`

, and `geometric`

the links `log`

, `identity`

, and `sqrt`

;
families `binomial`

, `cumulative`

, `cratio`

, `sratio`

, and `acat`

the links `logit`

, `probit`

, `probit_approx`

, and `cloglog`

;
family `categorical`

the link `logit`

; families `gamma`

, `weibull`

, and `exponential`

the links `log`

, `identity`

, and `inverse`

.
The first link mentioned for each family is the default.
**Prior distributions**
Below, we describe the usage of the `prior`

argument and list some common prior distributions
for parameters in `brms`

models.
A complete overview on possible prior distributions is given in the Stan Reference Manual available at
`brm`

performs no checks if the priors are written in correct Stan language.
Instead, Stan will check their correctness when the model is parsed to C++ and returns an error if they are not.
Currently, there are five types of parameters in `brms`

models,
for which the user can specify prior distributions.
1. Fixed effects
Every fixed (and partial) effect has its corresponding regression parameter. These parameters are named as
`b_(fixed)`

, where `(fixed)`

represents the name of the corresponding fixed effect.
Suppose, for instance, that `y`

is predicted by `x1`

and `x2`

(i.e. `y ~ x1+x2`

in formula syntax).
Then, `x1`

and `x2`

have regression parameters `b_x1`

and `b_x2`

respectively.
The default prior for fixed effects parameters is an improper flat prior over the reals.
Other common options are normal priors or uniform priors over a finite interval.
If we want to have a normal prior with mean 0 and standard deviation 5 for `b_x1`

,
and a uniform prior between -10 and 10 for `b_x2`

,
we can specify this via
`prior = list(b_x1 = "normal(0,5)", b_x2 = "uniform(-10,10)")`

.
To put the same prior (e.g. a normal prior) on all fixed effects at once,
we may write as a shortcut `prior =`

`list(b = "normal(0,5)")`

. In addition, this
leads to faster sampling in Stan, because priors can be vectorized.
2. Autocorrelation parameters
The autocorrelation parameters currently implemented are named `ar`

(autoregression) and `ma`

(moving average).
The default prior for autocorrelation parameters is an improper flat prior over the reals. It should be noted that `ar`

will
only take one values between -1 and 1 if the response variable is wide-sence stationay, i.e. if there is no drift in the responses.
3. Standard deviations of random effects
Each random effect of each grouping factor has a standard deviation named
`sd_(group)_(random)`

. Consider, for instance, the formula `y ~ x1+x2+(1+x1|z)`

.
We see that the intercept as well as `x1`

are random effects nested in the grouping factor `z`

.
The corresponding standard deviation parameters are named as `sd_z_Intercept`

and `sd_z_x1`

respectively.
These parameters are restriced to be non-negative and, by default,
have a half cauchy prior with 'mean' 0 and 'standard deviation' 5.
We could make this explicit by writing `prior = list(sd = "cauchy(0,5)")`

.
One common alternative is a uniform prior over a positive interval.
4. Correlations of random effects
If there is more than one random effect per grouping factor, the correlations between those random
effects have to be estimated.
However, in `brms`

models, the corresponding correlation matrix $C$ does not have prior itself.
Instead, a prior is defined for the cholesky factor $L$ of $C$. They are related through the equation
$$L * L' = C.$$
The prior `"lkj_corr_cholesky(eta)"`

with `eta > 0`

is essentially the only prior for
cholesky factors of correlation matrices.
If `eta = 1`

(the default) all correlations matrices are equally likely a priori. If `eta > 1`

,
extreme correlations become less likely,
whereas `0 < eta < 1`

results in higher probabilities for extreme correlations.
The cholesky factors in `brms`

models are named as
`L_(group)`

, (e.g., `L_z`

if `z`

is the grouping factor).
5. Parameters for specific families
Some families need additional parameters to be estimated.
Families `gaussian`

, `student`

, and `cauchy`

need the parameter `sigma`

to account for the standard deviation of the response variable around the regression line
(not to be confused with the standard deviations of random effects).
By default, `sigma`

has an improper flat prior over the positiv reals.
Furthermore, family `student`

needs the parameter `nu`

representing
the degrees of freedom of students t distribution.
By default, `nu`

has prior `"uniform(1,60)"`

.
Families `gamma`

and `weibull`

need the parameter `shape`

that has a `"gamma(0.01,0.01)"`

prior by default. For families `cumulative`

, `cratio`

, `sratio`

,
and `acat`

, and only if `threshold = "equidistant"`

, the parameter `delta`

is used to model the distance
between to adjacent thresholds. By default, `delta`

has an improper flat prior over the reals.

##### Value

- An object of class
`brmsfit`

, which contains the posterior samples along with many other useful information about the model. If rstan is not installed,`brmsfit`

will not contain posterior samples.

##### Examples

```
## Poisson Regression for the number of seizures in epileptic patients
## using half cauchy priors for standard deviations of random effects
fit_e <- brm(count ~ log_Age_c + log_Base4_c * Trt_c + (1|patient) + (1|visit),
data = epilepsy, family = "poisson", prior = list(sd = "cauchy(0,2.5)"))
## generate a summary of the results
summary(fit_e)
## plot the MCMC chains as well as the posterior distributions
plot(fit_e)
## extract random effects standard devations, correlation and covariance matrices
VarCorr(fit_e)
## extract random effects for each level
ranef(fit_e)
## Ordinal regression (with family 'sratio') modeling patient's rating
## of inhaler instructions using normal priors for fixed effects parameters
fit_i <- brm(rating ~ treat + period + carry, data = inhaler,
family = "sratio", prior = list(b = "normal(0,5)"))
summary(fit_i)
plot(fit_i)
## Surivival Regression (with family 'weibull') modeling time between
## first and second recurrence of an infection in kidney patients
## time | cens indicates which values in variable time are right censored
fit_k <- brm(time | cens(censored) ~ age + sex + disease, data = kidney,
family = "weibull", silent = TRUE, inits = "0")
summary(fit_k)
plot(fit_k)
```

*Documentation reproduced from package brms, version 0.3.0, License: GPL (>= 2)*