brm: Fit Bayesian Generalized Linear and Ordinal Mixed Models

Description

Fit a Bayesian generalized linear or ordinal mixed model using Stan

Usage

brm(formula, data = NULL, family = c("gaussian", "identity"),
  prior = list(), partial = NULL, threshold = "flexible", ranef = TRUE,
  predict = FALSE, fit = NA, n.chains = 2, n.iter = 2000,
  n.warmup = 500, n.thin = 1, n.cluster = 1, inits = "random",
  silent = FALSE, seed = 12345, save.model = NULL, engine = "stan",
  pars = "auto", post.pred = FALSE, ...)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the e

family

A vector of one or two character strings. The first string indicates the distribution of the dependent variable (the 'family'). Currently, the following families are supported: "gaussian", "student", "cauchy",

prior

A named list of character strings specifing the prior distributions of the parameters. Further information is provided under 'Details'.

partial

A one sided formula of the form ~partial.effects specifing the predictors that can vary between categories in non-cumulative ordinal models (i.e. in families "cratio", "sratio", or "acat").

threshold

A character string indicating the type of thresholds (i.e. intercepts) used in an ordinal model. "flexible" provides the standard unstructured thresholds and "equidistant" restricts the distance between consecutive thresholds to

ranef

A flag to indicate if random effects for each level of the grouping factor(s) should be saved (default is TRUE). Set to FALSE to save memory.

predict

A flag to indicate if posterior predictives of the dependent variable should be generated.

fit

An instance of S3 class brmsfit derived from a previous fit; defaults to NA. If fit is of class brmsfit, the compiled model associated with the fitted result is re-used and the arguments formula,

n.chains

Number of Markov chains (default: 2)

n.iter

Number of total iterations per chain (including burnin; default: 2000)

n.warmup

A positive integer specifying number of warmup (aka burnin) iterations. This also specifies the number of iterations used for stepsize adaptation, so warmup samples should not be used for inference. The number of warmup should not be larger than n.

n.thin

Thinning rate. Must be a positive integer. Set n.thin > 1 to save memory and computation time if n.iter is large. Default is 1, that is no thinning.

n.cluster

Number of clusters to use to run parallel chains. Default is 1.

inits

A list with n.chains elements; each element of the list is itself a list of starting values for the model, or a function creating (possibly random) initial values. If inits is NULL (the default), Stan will generate initial value

silent

logical; If TRUE, most intermediate output from Stan is suppressed.

seed

Positive integer. Used by set.seed to make results reproducable.

save.model

Either NULL or a character string. In the latter case, the model code is saved in a file with its name specified by save.model in the current working directory.

engine

A character string, either "stan" (the default) or "jags". Specifies which program should be used to fit the model. Note that jags is currently implemented for testing purposes only, does not allow full functionalit

pars

deprecated; all relevant parameters including random effects for each level are now returned by default.

post.pred

A deprecated alias for argument predict.

...

Further arguments to be passed to Stan.

Value

An object of class brmsfit, which contains the posterior samples along with many other useful information about the model. If rstan is not installed, brmsfit will not contain posterior samples.

Details

Fit a generalized linear mixed model, which incorporates both fixed-effects parameters and random effects in a linear predictor via full bayesian inference using Stan. Formula syntax The formula argument accepts formulas of the following syntax: response | addition ~ fixed + (random | group) Multiple grouping factors each with multiple random effects are possible. With the exception of addition, this is basically lme4 syntax. The optional argument addition has different meanings depending on the family argument. For families gaussian, student, and cauchy it may be a variable specifying the standard errors of the observation, thus allowing to perform meta-analysis. Suppose that the variable yi contains the effect sizes from the studies and sei the corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted using the formulae yi | sei ~ 1 and yi | sei ~ 1 + (1|study), respectively, where study is a variable uniquely identifying every study. If desired, meta-regressen can be performed via yi | sei ~ 1 + mod1 + mod2 + (1|study) or yi | sei ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study), where mod1 and mod2 represent moderator variables. For family binomial, addition may be a variable indicating the number of trials underlying each observation. In lme4 syntax, we may write for instance cbind(success, trials - success), which is equivalent to success | trials in brms syntax. If the number of trials is constant across all observation (say 10), we may also write success | 10. For family categorical and all ordinal families, addition specifies the number of categories for each observation, either with a variable name or a single number. For families gamma, exponential, and weibull, addition may contain a logical variable (or a variable than can be coerced to logical) indicating if the response variable is left censored (corresponding to TRUE) or not censored (corresponding to FALSE). Families and link functions Family gaussian with identity link leads to linear regression. Families student, and cauchy with identity link leads to robust linear regression that is less influenced by outliers. Families poisson, negbinomial, and geometric with log link lead to regression models for count data. Family binomial with logit link leads to logistic regression and family categorical to multi-logistic regression when there are more than two possible outcomes. Families cumulative, cratio ('contiuation ratio'), sratio ('stopping ratio'), and acat ('adjacent category') leads to ordinal regression. Families gamma, weibull, and exponential can be used (among others) for survival regression when combined with the log link. In the following, we list all possible links for each family. The families gaussian, student, and cauchy accept the links (as names) identity, log, and inverse; families poisson, negbinomial, and geometric the links log, identity, and sqrt; families binomial, cumulative, cratio, sratio, and acat the links logit, probit, probit_approx, and cloglog; family categorical the link logit; families gamma, weibull, and exponential the links log, identity, and inverse. The first link mentioned for each family is the default. Prior distributions Below, we describe the usage of the prior argument and list some common prior distributions for parameters in brms models. A complete overview on possible prior distributions is given in the Stan Reference Manual available at http://mc-stan.org/. brm performs no checks if the priors are written in correct Stan language. Instead, Stan will check their correctness when the model is parsed to C++ and returns an error if they are not. Currently, there are four types of parameters in brms models, for which the user can specify prior distributions. 1. Fixed effects Every fixed (and partial) effect has its corresponding regression parameter. These parameters are named as b_(fixed), where (fixed) represents the name of the corresponding fixed effect. Suppose, for instance, that y is predicted by x1 and x2 (i.e. y ~ x1+x2 in formula syntax). Then, x1 and x2 have regression parameters b_x1 and b_x2 respectively. The default prior for fixed effects parameters is an improper flat prior over the reals. Other common options are normal priors or uniform priors over a finite interval. If we want to have a normal prior with mean 0 and standard deviation 5 for b_x1, and a uniform prior between -10 and 10 for b_x2, we can specify this via prior = list(b_x1 = "normal(0,5)", b_x2 = "uniform(-10,10)"). To put the same prior (e.g. a normal prior) on all fixed effects at once, we may write as a shortcut prior = list(b = "normal(0,5)"). In addition, this leads to faster sampling in Stan, because priors can be vectorized. 2. Standard deviations of random effects Each random effect of each grouping factor has a standard deviation named sd_(group)_(random). Consider, for instance, the formula y ~ x1+x2+(1+x1|z). We see that the intercept as well as x1 are random effects nested in the grouping factor z. The corresponding standard deviation parameters are named as sd_z_Intercept and sd_z_x1 respectively. These parameters are restriced to be non-negative and, by default, have a half cauchy prior with 'mean' 0 and 'standard deviation' 5. We could make this explicit by writing prior = list(sd = "cauchy(0,5)"). One common alternative is a uniform prior over a positive interval. 3. Correlations of random effects If there is more than one random effect per grouping factor, the correlations between those random effects have to be estimated. However, in brms models, the corresponding correlation matrix $C$ does not have prior itself. Instead, a prior is defined for the cholesky factor $L$ of $C$. They are related through the equation $$L * L' = C.$$ The prior "lkj_corr_cholesky(eta)" with eta > 0 is essentially the only prior for cholesky factors of correlation matrices. If eta = 1 (the default) all correlations matrices are equally likely a priori. If eta > 1, extreme correlations become less likely, whereas 0 < eta < 1 results in higher probabilities for extreme correlations. The cholesky factors in brms models are named as L_(group), (e.g., L_z if z is the grouping factor). 4. Parameters for specific families Some families need additional parameters to be estimated. Families gaussian, student, and cauchy need the parameter sigma to account for the standard deviation of the response variable around the regression line (not to be confused with the standard deviations of random effects). By default, sigma has an improper flat prior over the positiv reals. Furthermore, family student needs the parameter nu representing the degrees of freedom of students t distribution. By default, nu has prior "uniform(1,60)". Families gamma and weibull need the parameter shape that has a "gamma(0.01,0.01)" prior by default.

Examples

Run this code

## Poisson Regression for the number of seizures in epileptic patients
## using half cauchy priors for standard deviations of random effects
fit_e <- brm(count ~ log_Age_c + log_Base4_c * Trt_c + (1|patient) + (1|visit),
           data = epilepsy, family = "poisson", prior = list(sd = "cauchy(0,2.5)"))
## generate a summary of the results
summary(fit_e)
## plot the MCMC chains as well as the posterior distributions
plot(fit_e)
## extract random effects standard devations, correlation and covariance matrices
VarCorr(fit_e)
## extract random effects for each level
ranef(fit_e)

## Ordinal regression (with family 'sratio') modeling patient's rating
## of inhaler instructions using normal priors for fixed effects parameters
fit_i <- brm(rating ~ treat + period + carry, data = inhaler,
              family = "sratio", prior = list(b = "normal(0,5)"))
summary(fit_i)
plot(fit_i)

## Surivival Regression (with family 'weibull') modeling time between
## first and second recurrence of an infection in kidney patients
## time | cens indicates which values in variable time are left censored
fit_k <- brm(time | cens ~ age + sex + disease, data = kidney,
             family = "weibull", silent = TRUE)
summary(fit_k)
plot(fit_k)

Run the code above in your browser using DataLab