loocv: Leave one group out cross-validation for `baggr` models

Description

Performs exact leave-one-group-out cross-validation on a baggr model.

Usage

loocv(data, return_models = FALSE, ...)

Value

log predictive density value, an object of class baggr_cv; full model, prior values and lpd of each model are also returned. These can be examined by using attributes() function.

Arguments

data: Input data frame - same as for baggr function.
return_models: logical; if FALSE, summary statistics will be returned and the models discarded; if TRUE, a list of models will be returned alongside summaries
...: Additional arguments passed to baggr.

Author

Witold Wiecek

Details

The values returned by loocv() can be used to understand how excluding any one group affects the overall result, as well as how well the model predicts the omitted group. LOO-CV approaches are a good general practice for comparing Bayesian models, not only in meta-analysis.

This function automatically runs K baggr models, where K is number of groups (e.g. studies), leaving out one group at a time. For each run, it calculates expected log predictive density (ELPD) for that group (see Gelman et al 2013). (In the logistic model, where the proportion in control group is unknown, each of the groups is divided into data for controls, which is kept for estimation, and data for treated units, which is not used for estimation but only for calculating predictive density. This is akin to fixing the baseline risk and only trying to infer the odds ratio.)

The main output is the cross-validation information criterion, or -2 times the ELPD summed over K models. (We sum the terms as we are working with logarithms.) This is related to, and often approximated by, the Watanabe-Akaike Information Criterion. When comparing models, smaller values mean a better fit. For more information on cross-validation see this overview article

For running more computation-intensive models, consider setting the mc.cores option before running loocv, e.g. options(mc.cores = 4) (by default baggr runs 4 MCMC chains in parallel). As a default, rstan runs "silently" (refresh=0). To see sampling progress, please set e.g. loocv(data, refresh = 500).

References

Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 'Understanding Predictive Information Criteria for Bayesian Models.' Statistics and Computing 24, no. 6 (November 2014): 997–1016.

Examples

Run this code

if (FALSE) {
# even simple examples may take a while
cv <- loocv(schools, pooling = "partial")
print(cv)      # returns the lpd value
attributes(cv) # more information is included in the object
}

Run the code above in your browser using DataLab