brm: Fit Bayesian Generalized Linear and Ordinal Mixed Models

Description

Fit a Bayesian generalized linear or ordinal mixed model using Stan

Usage

brm(formula, data = NULL, family = c("gaussian", "identity"),
  prior = NULL, addition = NULL, autocor = NULL, partial = NULL,
  threshold = c("flexible", "equidistant"), cov.ranef = NULL,
  ranef = TRUE, sample.prior = FALSE, fit = NA, inits = "random",
  n.chains = 2, n.iter = 2000, n.warmup = 500, n.thin = 1,
  n.cluster = 1, cluster_type = "PSOCK", silent = FALSE, seed = 12345,
  save.model = NULL, ...)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the e

family

A vector of one or two character strings. The first string indicates the distribution of the dependent variable (the 'family'). Currently, the following families are supported: "gaussian", "student", "cauchy",

prior

One or more brmsprior objects created by function set_prior and combined using the c method. A single brmsprior object may be passed without c()

addition

A named list of one sided formulas each containing additional information on the response variable. The following names are allowed: se for specifying standard errors for meta-analysis, weights to fit weighted regression models,

autocor

An optional cor_brms object describing the correlation structure within the response variable (i.e. the 'autocorrelation'). See the documentation of cor_brms

partial

A one sided formula of the form ~expression specifying the predictors with category specific effects in non-cumulative ordinal models (i.e. in families "cratio", "sratio", or "acat").

threshold

A character string indicating the type of thresholds (i.e. intercepts) used in an ordinal model. "flexible" provides the standard unstructured thresholds and "equidistant" restricts the distance between consecutive thresholds to

cov.ranef

A list of matrices that are proportional to the (within) covariance structure of the random effects. The names of the matrices should correspond to columns in data that are used as grouping factors. All levels of the grouping factor should

ranef

A flag to indicate if random effects for each level of the grouping factor(s) should be saved (default is TRUE). Set to FALSE to save memory. The argument has no impact on the model fitting itself.

sample.prior

A flag to indicate if samples from all specified proper priors should be additionally drawn. Among others, these samples can be used to calculate Bayes factors for point hypotheses. Default is FALSE.

fit

An instance of S3 class brmsfit derived from a previous fit; defaults to NA. If fit is of class brmsfit, the compiled model associated with the fitted result is re-used and the arguments formula<

inits

Either "random" or "0". If inits is "random" (the default), Stan will randomly generate initial values for parameters. If it is "0", all parameters are initiliazed to zero. This option is recommended fo

n.chains

Number of Markov chains (default: 2)

n.iter

Number of total iterations per chain (including burnin; default: 2000)

n.warmup

A positive integer specifying number of warmup (aka burnin) iterations. This also specifies the number of iterations used for stepsize adaptation, so warmup samples should not be used for inference. The number of warmup should not be larger than n.

n.thin

Thinning rate. Must be a positive integer. Set n.thin > 1 to save memory and computation time if n.iter is large. Default is 1, that is no thinning.

n.cluster

Number of clusters to use to run parallel chains. Default is 1.

cluster_type

A character string specifying the type of cluster created by makeCluster when sampling in parallel (i.e. when n.cluster is greater 1). Default is "PS

silent

logical; If TRUE, most intermediate output from Stan is suppressed.

seed

Positive integer. Used by set.seed to make results reproducable.

save.model

Either NULL or a character string. In the latter case, the model code is saved in a file named after the string supplied in save.model, which may also contain the full path where to save the file. If only a name is given, the f

...

Further arguments to be passed to Stan.

Value

An object of class brmsfit, which contains the posterior samples along with many other useful information about the model. Use methods(class = "brmsfit") for an overview on available methods.

Details

Fit a generalized linear mixed model, which incorporates both fixed-effects parameters and random effects in a linear predictor via full bayesian inference using Stan. During warmup aka burnin phase, Stan may print out quite a few informational messages that "the current Metropolis proposal is about the be rejected ...". These messages can be ignored in nearly all cases. Use silent = TRUE to stop these messages from being printed out. Formula syntax The formula argument accepts formulas of the following syntax: response | addition ~ fixed + (random | group) Multiple grouping factors each with multiple random effects are possible. Instead of | you may use || in random effects terms to prevent random effects correlations from being modeled. With the exception of addition, this is basically lme4 syntax. The optional addition term may contain multiple terms of the form fun(variable) seperated by | each providing special information on the response variable. fun can be replaced with either se, weights, trials, cat, or cens (their meanings are explained below). Using the addition term in formula is equivalent to using argument addition: Instead of writing fun(variable) in formula, we may use addition = list(fun = ~variable). For families gaussian, student, and cauchy it is possible to specify standard errors of the observation, thus allowing to perform meta-analysis. Suppose that the variable yi contains the effect sizes from the studies and sei the corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted using the formulae yi | se(sei) ~ 1 and yi | se(sei) ~ 1 + (1|study), respectively, where study is a variable uniquely identifying every study. If desired, meta-regressen can be performed via yi | se(sei) ~ 1 + mod1 + mod2 + (1|study) or yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study), where mod1 and mod2 represent moderator variables. For all families, weighted regression may be performed using weights in the addition part. Suppose that variable wei contains the weights and that yi is the response variable. Then, formula yi | weights(wei) ~ predictors implements a weighted regression. For family binomial, addition may contain a variable indicating the number of trials underlying each observation. In lme4 syntax, we may write for instance cbind(success, n - success), which is equivalent to success | trials(n) in brms syntax. If the number of trials is constant across all observation (say 10), we may also write success | trials(10). For family categorical and all ordinal families, addition may contain a term cat(number) to specify the number categories (e.g, cat(7). If not given, the number of categories is calculated from the data. With the expection of categorical and ordinal families, left and right censoring can be modeled through yi | cens(censored) ~ predictors. The censoring variable (named censored in this example) should contain the values 'left', 'none', and 'right' (or equivalenty -1, 0, and 1) to indicate that the corresponding observation is left censored, not censored, or right censored. Mutiple addition terms may be specified at the same time, for instance formula = yi | se(sei) | cens(censored) ~ 1 for a censored meta-analytic model, equivalent to formula = yi ~ 1 and addition = list(se = ~sei, cens = ~censored) when using argument addition. Family gaussian allows to perform multivariate (normal) regression using cbind notation. Suppose that y1 and y2 are response variables and x is a predictor, then cbind(y1,y2) ~ x speficies a multivariate model, where x has the same effect on y1 and y2. To indicate different effects on each response variable, the word trait (which is reserved in models with mutiple responses) can be used as an additional categorical predictor. For instance, cbind(y1,y2) ~ -1 + x:trait leads to seperate effects of x on y1 and y2. In this case, trait has two levels, namely "y1" and "y2". By default, trait is dummy-coded. It may also be used within random effects terms, both as grouping factor or as random effect within a grouping factor. Note that variable trait is generated internally and may not be specified in the data passed to brm. Families and link functions Family gaussian with identity link leads to linear regression. Families student, and cauchy with identity link leads to robust linear regression that is less influenced by outliers. Families poisson, negbinomial, and geometric with log link lead to regression models for count data. Families binomial and bernoulli with logit link leads to logistic regression and family categorical to multi-logistic regression when there are more than two possible outcomes. Families cumulative, cratio ('contiuation ratio'), sratio ('stopping ratio'), and acat ('adjacent category') leads to ordinal regression. Families gamma, weibull, and exponential can be used (among others) for survival regression when combined with the log link. In the following, we list all possible links for each family. The families gaussian, student, and cauchy accept the links (as names) identity, log, and inverse; families poisson, negbinomial, and geometric the links log, identity, and sqrt; families binomial, bernoulli, cumulative, cratio, sratio, and acat the links logit, probit, probit_approx, and cloglog; family categorical the link logit; families gamma, weibull, and exponential the links log, identity, and inverse. The first link mentioned for each family is the default. Prior distributions As of brms 0.5.0, priors should be specified using the set_prior function. Its documentation contains detailed information on how to correctly specify priors.

Examples

Run this code

## Poisson Regression for the number of seizures in epileptic patients
## using uniform priors for fixed effects
## and half cauchy priors for standard deviations of random effects
fit_e <- brm(count ~ log_Age_c + log_Base4_c * Trt_c + (1|patient) + (1|visit),
           data = epilepsy, family = "poisson",
           prior = c(set_prior("uniform(-10,10)", class = "b"),
                     set_prior("cauchy(0,2)", class = "sd")))
## generate a summary of the results
summary(fit_e)
## plot the MCMC chains as well as the posterior distributions
plot(fit_e)
## extract random effects standard devations, correlation and covariance matrices
VarCorr(fit_e)
## extract random effects for each level
ranef(fit_e)

## Ordinal regression (with family 'sratio') modeling patient's rating
## of inhaler instructions using normal priors for fixed effects parameters
fit_i <- brm(rating ~ treat + period + carry, data = inhaler,
              family = "sratio", prior = set_prior("normal(0,5)"))
summary(fit_i)
plot(fit_i)

## Surivival Regression (with family 'weibull') modeling time between
## first and second recurrence of an infection in kidney patients
## time | cens indicates which values in variable time are right censored
fit_k <- brm(time | cens(censored) ~ age + sex + disease, data = kidney,
             family = "weibull", silent = TRUE, inits = "0")
summary(fit_k)
plot(fit_k)

## Logisitic Regression (with family 'binomial') to illustrate model specification
## variable n contains the number of trials, success contains the number of successes
n <- sample(1:10, 100, TRUE)
success <- rbinom(100, size = n, prob = 0.4)
x <- rnorm(100)
fit_b <- brm(success | trials(n) ~ x, family = "binomial")
summary(fit_b)

Run the code above in your browser using DataLab