Bayes: Summarize Bayesian Markov Chain Monte Carlo (MCMC) object

Description

Convert a list of Bayesian analysis chains (e.g., coda package mcmc.list objects) into a data frame for analysis and creating plots. For example, models from JAGS or Stan that were converted into coda class objects can be used to create the data frames. Calculates a set of descriptive statistics that summarize MCMC parameters. And values can be calculated for use in descriptive graphs such as values associated with specific percentiles and vice-versa to help set targets, summaries on hierarchical or multilevel models (up to 3 levels), and the R2 for Bayesian regression models with metric level predictors.

Usage

Bayes(
  x,
  y = "mcmc",
  parameter = NULL,
  mass = 0.95,
  compare = NULL,
  rope = NULL,
  newdata = FALSE,
  type = NULL,
  center = "mode",
  data = NULL,
  dv = NULL,
  iv = NULL,
  expand = NULL,
  targets = NULL
)

Value

data frame of summary statistics for the MCMC parameter's distribution and/or MCMC data frame. Statistics include highest density interval, effective sample size, proportion of distribution within and outside of a ROPE, distribution compared with a set value, and the parameter's mean, median, and mode. And distribution summaries for multilevel models, target summaries, and regression model R2.

Arguments

x: list object of multiple MCMC chains (e.g., matrix class list elements or coda mcmc.list).
y: character vector for the type of analysis or output to perform. Select 'post', 'multi', 'target', 'r2', or 'mcmc' for a posterior summary, multilevel/hierarchical model summary (up to 3 levels), target summary, Gelman R-squared statistic, or list object of MCMC chains converted into a data frame. Default is generic 'mcmc'(no analysis, just MCMC creation).
parameter: single or multiple element character vector name of parameter(s) in MCMC chains to produce summary statistics. When y='target', use the generally 2 to 3 parameters that represent the distribution parameters (e.g., parameter= c('mean', 'sd')). When y='r2', use the regression parameters in order, ending with the residual or level-1 variance (e.g., parameter= c('intercept', 'beta1', 'beta2', 'standard_deviation')). Default is NULL.
mass: numeric vector that specifies the credible mass used in the Highest Density Interval (HDI). Default is 0.95.
compare: numeric vector with one comparison value to determine how much of the distribution is above or below the comparison value. Default is NULL.
rope: numeric vector with two values that define the Region of Practical Equivalence (ROPE). Test hypotheses by setting low and high values to determine if the Highest Density Interval (HDI) is within or outside of the ROPE. Parameter value declared not credible if the entire ROPE lies outside the HDI of the parameter’s posterior (i.e., we reject the null hypothesis). For example, the ROPE of a coin is set to 0.45 to 0.55 but the posterior 95% HDI is 0.61 - 0.69 so we reject the null hypothesis value of 0.50. We can accept the null hypothesis if the entire 95% HDI falls with the ROPE. Default is NULL.
newdata: optional logical vector that indicates if you want the new MCMC data returned. When newdata=TRUE, it will return the list object of MCMC chains, converted into a data frame. This data is used for analysis and all plots. Please select newdata=TRUE to produce any graphs but not needed when y='multi'. The default is newdata=FALSE.
type: character vector of length == 1 that indicates the likelihood function used in the model when y='multi' or y='target'. Select 'n', 'ln', 'w', 'g', 't', 'bern', and 'bin' for these respective options in Bayesian estimation (multilevel): 'Normal', 'Log-normal', 'Weibull', 'Gamma', 't', 'Bernoulli', or 'binomial'. Default is NULL.
center: character vector that selects the type of central tendency to use when reporting parameter values when y='post', y='target', or y='r2'. Choices include: 'mean', 'median', and 'mode' when y='post', or 'mean' and 'median' when y='r2'. Default is 'mode' when y='post' or 'target' and 'median' when y='r2'.
data: object name for the observed data when y='multi' or y='r2'. Default is NULL.
dv: character vector of length == 1 for the dependent variable name in the observed data frame when y='multi'. Default is NULL.
iv: character vector of length >= 1 for the independent variable name(s) in the observed data frame when y='multi' or y='r2'. When y='multi', enter the lower to higher level clustering or group names (e.g, for health data, iv=c("patient", "hospital"). When type='taov', enter the name of the test group variable. When y='r2', enter the observed data variable names for the hierarchical or multilevel groups. Default is NULL.
expand: a character vector of length == 1 indicating the variable name to expand aggregated data into non-aggregated data frames when y='multi'. This variable is the denominator that can be used to calculate a rate in the formula numerator/denominator. For example, when the 'numerator' column equals 4 and the 'denominator' column equals 10, then this single row of data is expanded to 10 rows with four values of 1 and six values of 0 when expand='denominator'. Default is NULL.
targets: list of one or two named elements (p, y) with numeric values that represent quantile values (p) in the distribution to return associated outcome values and/or specific outcome values (y) to retrieve associated probabilities. For example, a distribution of harmful hospital readmission rates has an estimated median value of 0.25. Staff are considering 2 types of targets, percentiles (p) of key interest and specific outcome rates (y). They want to know the readmission rate that is at the 40th percentile for a reduced readmission rate (below what is 'average' at the 50th percentile) and the probability greater than a readmission rate of 0.20. They get this information by entering targets=list(p=0.40, y=0.20); calculating 1 - prob(y) from returned results gives them an idea about the effort needed to meet this target of a reduced readmission rate. Select type= one of these options: 'n', 'ln', 'w', 'g', 't', 'bern', 'bin'. Also select parameter= the appropriate center, spread, and possible 3rd shape distribution parameter (e.g., parameter=c('mean', 'sd')). And option to select center= 'mean', 'median', 'mode'. Default is NULL.

References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Second Edition. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. ISBN 0-8058-0283-5

Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for Bayesian Regression Models. The American Statistician, 73, 3, 307–309. https://doi.org/10.1080/00031305.2018.1549100

Kruschke, J. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. New York: Academic Press. ISBN: 9780124058880

Examples

Run this code

# Posterior estimates of length of stay (LOS) #
# What is the 'average' LOS and are we beyond our goal of it being <= 3 days?
blos1 <- Bayes(losmcmc, y="post", parameter="muOfY", newdata=TRUE,
compare=4.5, rope=c(1,3))   #Where are we in relation to 4.5 days?
print(blos1$Posterior.Summary) #looks like LOS is substantially higher than 3 days

# Multilevel or hierarchical model summaries #
# Code below does not run, no 'mcmc_sample' object
# bmulti0 <- Bayes(x=mcmc_sample, parameter=c("theta", "omega","omegaO"),
# y="multi", type="bern", data=mydf, dv="upbin", iv= c("Plant", "Group"))

# Targets for length of stay (LOS) #
# Our administrators ask how far are we from our goals, they ask about targets
# in increments of 5 points of probability or specific days. We answer both.
btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"),
newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55),  y=c(3,4))) # 'newdata' for plots
print(btarget1$Target$Est.Quantile.P)  # 5-point increments
print(btarget1$Target$Est.Prob.GT.Y)   # specific day values, 4 days is plausible

# Gelamn's R^2 #
# the regression model using Base R data, CO2: update ~ conc
bR2 <- Bayes(x=co2mcmc, y='r2', data=CO2, iv="uptake",
parameter=c("b0", "b1", "sigma"), newdata=TRUE)
# bR2 returns various information
print(bR2$R2.Summary$R2)                   # R^2
print(bR2$R2.Summary$Variance.Pred.Y)      # Variance of predicted outcome
print(bR2$R2.Summary$Variance.Residuals)   # Variance of residuals
print(head(bR2$R2.Summary$yPRED))          # Predicted outcomes

Run the code above in your browser using DataLab