stan_mvmer: Bayesian multivariate generalized linear models with correlated group-specific terms via Stan

Description

Bayesian inference for multivariate GLMs with group-specific coefficients that are assumed to be correlated across the GLM submodels.

Usage

stan_mvmer(
  formula,
  data,
  family = gaussian,
  weights,
  prior = normal(autoscale = TRUE),
  prior_intercept = normal(autoscale = TRUE),
  prior_aux = cauchy(0, 5, autoscale = TRUE),
  prior_covariance = lkj(autoscale = TRUE),
  prior_PD = FALSE,
  algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = NULL,
  max_treedepth = 10L,
  init = "random",
  QR = FALSE,
  sparse = FALSE,
  ...
)

Value

A stanmvreg object is returned.

Arguments

formula

A two-sided linear formula object describing both the fixed-effects and random-effects parts of the longitudinal submodel similar in vein to formula specification in the lme4 package (see glmer or the lme4 vignette for details). Note however that the double bar (||) notation is not allowed when specifying the random-effects parts of the formula, and neither are nested grouping factors (e.g. (1 | g1/g2)) or (1 | g1:g2), where g1, g2 are grouping factors. For a multivariate GLM this should be a list of such formula objects, with each element of the list providing the formula for one of the GLM submodels.

data

A data frame containing the variables specified in formula. For a multivariate GLM, this can be either a single data frame which contains the data for all GLM submodels, or it can be a list of data frames where each element of the list provides the data for one of the GLM submodels.

family

The family (and possibly also the link function) for the GLM submodel(s). See glmer for details. If fitting a multivariate GLM, then this can optionally be a list of families, in which case each element of the list specifies the family for one of the GLM submodels. In other words, a different family can be specified for each GLM submodel.

weights

Same as in glm, except that when fitting a multivariate GLM and a list of data frames is provided in data then a corresponding list of weights must be provided. If weights are provided for one of the GLM submodels, then they must be provided for all GLM submodels.

prior, prior_intercept, prior_aux

Same as in stan_glmer except that for a multivariate GLM a list of priors can be provided for any of prior, prior_intercept or prior_aux arguments. That is, different priors can optionally be specified for each of the GLM submodels. If a list is not provided, then the same prior distributions are used for each GLM submodel. Note that the "product_normal" prior is not allowed for stan_mvmer.

prior_covariance

Cannot be NULL; see priors for more information about the prior distributions on covariance matrices. Note however that the default prior for covariance matrices in stan_mvmer is slightly different to that in stan_glmer (the details of which are described on the priors page).

prior_PD

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

algorithm

A string (possibly abbreviated) indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "optimizing" for optimization, "meanfield" for variational inference with independent normal distributions, or "fullrank" for variational inference with a multivariate normal distribution. See rstanarm-package for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.

adapt_delta

Only relevant if algorithm="sampling". See the adapt_delta help page for details.

max_treedepth

A positive integer specifying the maximum treedepth for the non-U-turn sampler. See the control argument in stan.

init

The method for generating initial values. See stan.

QR

A logical scalar defaulting to FALSE, but if TRUE applies a scaled qr decomposition to the design matrix. The transformation does not change the likelihood of the data but is recommended for computational reasons when there are multiple predictors. See the QR-argument documentation page for details on how rstanarm does the transformation and important information about how to interpret the prior distributions of the model parameters when using QR=TRUE.

sparse

A logical scalar (defaulting to FALSE) indicating whether to use a sparse representation of the design (X) matrix. If TRUE, the the design matrix is not centered (since that would destroy the sparsity) and likewise it is not possible to specify both QR = TRUE and sparse = TRUE. Depending on how many zeros there are in the design matrix, setting sparse = TRUE may make the code run faster and can consume much less RAM.

...

Further arguments passed to the function in the rstan package (sampling, vb, or optimizing), corresponding to the estimation method named by algorithm. For example, if algorithm is "sampling" it is possible to specify iter, chains, cores, and other MCMC controls.

Another useful argument that can be passed to rstan via ... is refresh, which specifies how often to print updates when sampling (i.e., show the progress every refresh iterations). refresh=0 turns off the iteration updates.

Details

The stan_mvmer function can be used to fit a multivariate generalized linear model (GLM) with group-specific terms. The model consists of distinct GLM submodels, each which contains group-specific terms; within a grouping factor (for example, patient ID) the grouping-specific terms are assumed to be correlated across the different GLM submodels. It is possible to specify a different outcome type (for example a different family and/or link function) for each of the GLM submodels.

Bayesian estimation of the model is performed via MCMC, in the same way as for stan_glmer. Also, similar to stan_glmer, an unstructured covariance matrix is used for the group-specific terms within a given grouping factor, with priors on the terms of a decomposition of the covariance matrix.See priors for more information about the priors distributions that are available for the covariance matrices, the regression coefficients and the intercept and auxiliary parameters.

Examples

Run this code

if (.Platform$OS.type != "windows" || .Platform$r_arch !="i386") {
# \donttest{
#####
# A multivariate GLM with two submodels. For the grouping factor 'id', the 
# group-specific intercept from the first submodel (logBili) is assumed to
# be correlated with the group-specific intercept and linear slope in the 
# second submodel (albumin)
f1 <- stan_mvmer(
        formula = list(
          logBili ~ year + (1 | id), 
          albumin ~ sex + year + (year | id)),
        data = pbcLong, 
        # this next line is only to keep the example small in size!
        chains = 1, cores = 1, seed = 12345, iter = 1000)
summary(f1) 

#####
# A multivariate GLM with one bernoulli outcome and one
# gaussian outcome. We will artificially create the bernoulli
# outcome by dichotomising log serum bilirubin
pbcLong$ybern <- as.integer(pbcLong$logBili >= mean(pbcLong$logBili))
f2 <- stan_mvmer(
        formula = list(
          ybern ~ year + (1 | id), 
          albumin ~ sex + year + (year | id)),
        data = pbcLong,
        family = list(binomial, gaussian),
        chains = 1, cores = 1, seed = 12345, iter = 1000)
# }
}