baggr: Bayesian aggregate treatment effects model

Description

Bayesian inference on parameters of an average treatment effects model that's appropriate to the supplied individual- or group-level data, using Hamiltonian Monte Carlo in Stan. (For overall package help file see baggr-package)

Usage

baggr(data, model = NULL, pooling = "partial",
  prior_hypermean = NULL, prior_hypersd = NULL,
  prior_hypercor = NULL, prior = NULL, ppd = FALSE,
  test_data = NULL, quantiles = seq(0.05, 0.95, 0.1),
  outcome = "outcome", group = "group", treatment = "treatment",
  warn = TRUE, ...)

Arguments

data

data frame with summary or individual level data to meta-analyse

model

if NULL, detected automatically from input data otherwise choose from "rubin", "mutau", "individual", "quantiles" (see Details).

pooling

Type of pooling; choose from "none", "partial" (default) and "full". If you are not familiar with the terms, consult the vignette; "partial" can be understood as random effects and "full" as fixed effects

prior_hypermean

prior distribution for hypermean; you can use "plain text" notation like prior_hypermean=normal(0,100) or uniform(-10, 10). See Details:Priors below for more possible specifications. If unspecified, the priors will be derived automatically based on data (and printed out in the console).

prior_hypersd

prior for hyper-standard deviation, used by Rubin and "mutau"`` models; same rules apply as for _hypermean`;

prior_hypercor

prior for hypercorrelation matrix, used by the "mutau" model

prior

alternative way to specify all priors as a named list with hypermean, hypersd, hypercor, e.g. prior = list(hypermean = normal(0,10))

ppd

logical; use prior predictive distribution? (p.p.d.) Default is no. If ppd=TRUE, Stan model will sample from the prior distributions and ignore data in inference. However, data argument might still be used to infer the correct model and to set the default priors.

test_data

data for cross-validation; NULL for no validation, otherwise a data frame with the same columns as data argument

quantiles

if model = "quantiles", a vector indicating which quantiles of data to use (with values between 0 and 1)

outcome

character; column name in (individual-level) data with outcome variable values

group

character; column name in data with grouping factor; it's necessary for individual-level data, for summarised data it will be used as labels for groups when displaying results

treatment

character; column name in (individual-level) data with treatment factor;

warn

print an additional warning if Rhat exceeds 1.05

...

extra options passed to Stan function, e.g. control = list(adapt_delta = 0.99), number of iterations etc.

Value

baggr class structure: a list including Stan model fit alongside input data, pooling metrics, various model properties. If test data is used, mean value of -2*lpd is reported as mean_lpd

Details

Running baggr requires 1/ data preparation, 2/ choice of model, 3/ choice of priors. All three are discussed in depth in the package vignette (vignette("baggr")).

Data. For aggregate data models you need a data frame with columns tau and se or tau, mu, se.tau, se.mu. An additional column can be used to provide labels for each group (by default column group is used if available, but this can be customised -- see the example below). For individual level data three columns are needed: outcome, treatment, group. These are identified by using the outcome, treatment and group arguments.

When working with individual-level data, many data preparation steps (summarising, standardisation etc.) can be done through a helper function prepare_ma. Using it will also automatically format data inputs to be work with baggr().

Models. Available models are:

for the means: "rubin" model for average treatment effect, "mutau" version which takes into account means of control groups, "full", which works with individual-level data
"quantiles" model is also available (see Meager, 2019 in references)

If no model is specified, the function tries to infer the appropriate model automatically. Additionally, the user must specify type of pooling. The default is always partial pooling.

Priors. It is optional to specify priors yourself, as the package will try propose an appropriate prior for the input data if you do not pass a prior argument. To set the priors yourself, use prior_ arguments. For specifying many priors at once (or re-using between models), a single prior = list(...) argument can be used instead. Appropriate examples are given in vignette("baggr").

Examples

Run this code

# NOT RUN {
df_pooled <- data.frame("tau" = c(1, -1, .5, -.5, .7, -.7, 1.3, -1.3),
                        "se" = rep(1, 8),
                        "state" = datasets::state.name[1:8])
baggr(df_pooled) #baggr automatically detects the input data
# correct labels, different pooling & passing some options to Stan
baggr(df_pooled, group = "state", pooling = "full", iter = 500)
#change the priors:
baggr(df_pooled, prior_hypermean = normal(5,5))

# "mu & tau" model, using a built-in dataset
# prepare_ma() can summarise individual-level data
# }
# NOT RUN {
microcredit_summary_data <- prepare_ma(microcredit_simplified,
                                       outcome = "consumerdurables")
baggr(microcredit_summary_data, model = "mutau",
      pooling = "partial", prior_hypercor = lkj(1),
      prior_hypersd = normal(0,10),
      prior_hypermean = multinormal(c(0,0),matrix(c(10,3,3,10),2,2)))
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab