plot.Bayes: Bayesian plots for various analyses

Description

Graph Bayesian diagnostic traceplots, density plots on convergence, autocorrelation factor, calculate Monte Carlo Standard Errors, and effective sample sizes. And plot summaries of posterior distributions, posterior predictive checks, summaries on hierarchical or multilevel models (up to 3 levels), and summary graphs of values associated with specific percentiles (and vice-versa) that can be used to help set targets. Plots are developed from Bayes class objects that converted multiple chains into data frames.

Usage

# S3 method for Bayes
plot(
  x,
  y = NULL,
  type = "n",
  parameter = NULL,
  center = "mode",
  mass = 0.95,
  compare = NULL,
  rope = NULL,
  data = NULL,
  dv = NULL,
  iv = NULL,
  add.data = "n",
  group = NULL,
  main = NULL,
  xlab = NULL,
  ylab = NULL,
  xlim = NULL,
  ylim = NULL,
  vlim = NULL,
  curve = FALSE,
  lwd = NULL,
  breaks = 15,
  bcol = NULL,
  lcol = NULL,
  pcol = NULL,
  xpt = NULL,
  tgt = NULL,
  tgtcol = "gray",
  tpline = NULL,
  tpcol = NULL,
  pline = 20,
  pct = 95,
  add.legend = NULL,
  legend = NULL,
  cex = 1,
  cex.lab = NULL,
  cex.axis = NULL,
  cex.main = NULL,
  cex.text = NULL,
  cex.legend = NULL,
  HDItext = 0.7,
  math = "n",
  es = "n",
  subset = NULL,
  level = NULL,
  aorder = TRUE,
  round.c = 2,
  ...
)

Value

plots assessing model quality, posterior distributions, posterior predictive checks, hierarchical or multilevel model summaries, and target summaries.

Arguments

x: Bayes class object.
y: character vector for the type of plot to graph. Select 'post', 'dxa', 'dxd', 'dxg', 'dxt', 'check', 'multi' (specialized version of 'check'), or 'target' for posterior summary, diagnostics (4 'dx' plots produced: autocorrelation factor, density plots on chain convergence, Gelman-Rubin statistic, and traceplot), posterior predictive check, multilevel or hierarchical model summary (up to 3 levels), or target summary plots. Default is 'post'.
type: character vector of length == 1 that indicates the likelihood function used in the model when y='check' or y='multi'. Posterior predictive checks allow us to see how well our estimates match the observed data. These checks are available for Bayesian estimation of outcomes and regression trend lines (with polynomial terms) using various distributions in the likelihood function. Select 'n', 'ln', 'sn', 'w', 'g', 't', 'taov', 'taov1', 'ol', 'oq','oc', 'lnl', 'lnq', 'lnc', 'logl', 'logq', 'logc', 'bern', and 'bin' for these respective options in Bayesian estimation (multilevel): 'Normal', 'Log-normal', 'Skew-normal', 'Weibull', 'Gamma', 't', 't: ANOVA, side view', 't: ANOVA 1 group, side view'; and for regression trend lines: 'OLS: Linear', 'OLS: Quadratic', 'OLS: Cubic', 'Log-normal: Linear', 'Log-normal: Quadratic', 'Log-normal: Cubic', 'Logistic: Linear', 'Logistic: Quadratic', and 'Logistic: Cubic', 'Bernoulli', and 'binomial'. The first 8 selections are for Bayesian estimation of outcomes, the next 9 options were developed to assess regression trend lines from ordinary least squares (OLS), log-normal, and logistic models and also for hierarchical model versions. And the remaining two ('bern' and 'bin') are for when y='multi'. Additional models analogous to 'Generalized Linear Models' can also be graphed on the logit scale using 'OLS' options. For example, plot a logistic model on the logit scale when type='ol' (i.e., view straight trend lines). Or if you prefer viewing results on the probability scale, select type='logl' (i.e., curved lines). And consider using type= 'lnl', 'lnq', 'lnc' for log-normal and Poisson models with lines based on exponentiated values. In general, it is important to note that the observed data may not be on the same scale as the parameter estimates and may not be automatically visible in the graph (i.e., x and y-axis limits may help). When graphing target summary plots that use posterior predictive checks rather than the basic posterior summary graph (plots target values over the parameter estimate of the center such as the mean), enter y='target' and other arguments relevant to y='check'. However, type= 'bern', 'bin', 'sn' are not available but type='n', 'ln', 'w', 'g', or 't' are available. Default is NULL.
parameter: a character vector of length >= 1 or a 2 element list with the name(s) of parameter in MCMC chains to produce summary statistics. Use a 1 element vector to get posterior estimates of a single parameter. Use a 2 or more element vector to estimate the average joint effects of multiple parameters (e.g., average infection rate for interventions A and B when parameter= c('IntA', 'IntB')). Use a 2 element list to perform mathematical calculations of multiple parameters (see 'math' below). For example, use parameter=list('hospital_A', 'hospital_Z') if you want to estimate the difference between the hospital's outcomes. Use parameter= list(c('hospital_A','hospital_B'), ('hospital_Y','hospital_Z')) to estimate how different the combined hospitals A and B values are from the combined Hospital Y and Z values. When y='check', use either a multiple element character vector that represents center, spread, and additional distribution parameters in order of 1st, 2nd, and 3rd distribution parameters. For example, mean and sd for a normal distribution; mean log and sd log of a log-normal dist.; xi, omega, and alpha of a skew-normal distribution; shape, scale, and lambda of a Weibull distribution; shape and rate of a Gamma distribution; and mean, SD and nu (i.e., degrees of freedom) of a t-distribution. Or indicate regression parameters in order (e.g., intercept, Beta 1, Beta 2, etc.). When y='multi', use a multiple element character vector to list the parameter names of the hierarchy, in order of the nesting with the lowest level first (e.g., exams nested in patients nested in hospital). When y='multi', for parameters from multiple groups such as various hospitals, only enter the first unit's prefix of each parameter and the remaining groups will be set up for graphing. For example, parameter=c('theta', 'omega') will plot data for theta[1] to theta[8] and omega[1] to omega[8] for all 8 hospitals as well.
center: character vector that selects the type of central tendency to use when reporting parameter values. Choices include: 'mean', 'median', and 'mode'. Default is 'mode'.
mass: numeric vector that specifies the credible mass used in the Highest Density Interval (HDI). Default is 0.95.
compare: numeric vector with one comparison value to determine how much of the distribution is above or below the comparison value. Default is NULL.
rope: numeric vector with two values that define the Region of Practical Equivalence (ROPE). Test hypotheses by setting low and high values to determine if the Highest Density Interval (HDI) is within or outside of the ROPE. Parameter values are declared not credible if the entire ROPE lies outside the HDI of the parameter’s posterior (i.e., we reject the null hypothesis). For example, the ROPE of a coin is set to 0.45 to 0.55 but the posterior 95% HDI is 0.61 - 0.69 so we reject the null hypothesis value of the rate of a head is 0.50. We can accept the null hypothesis if the entire 95% HDI falls with the ROPE. Default is NULL.
data: object name for the observed data when y='check', y='multi' or y='target'.
dv: character vector of length == 1 for the dependent variable name in the observed data frame when y='check', y='multi' or y='target'. Default is NULL.
iv: character vector of length >= 1 for the independent variable name(s) in the observed data frame when y='check' or y='multi'. When y='multi', enter the lower to higher level clustering or group names (e.g, for health data, iv=c("patient", "hospital"). When type='taov', enter the name of the test group variable (e.g., 'intervention'). Default is NULL.
add.data: character vector of length == 1 to determine the type of observed data added to the plot (to show model fit to data) when y='check' and type= 'ol', 'oq','oc', 'lnl', 'lnq', 'lnc', 'logl', 'logq', or 'logc'. Select 'a', 'u', 'al', 'ul', 'n' for these observed data options: 'All', 'Unit', 'All: Lines', 'Unit: Lines' (unit specific lines are linked), 'none'. Default is 'n' for none or no observed data shown.
group: character list of length == 2 for 1) the grouping variable name and 2) specific group(s) in the observed data frame. This is primarily used for multilevel or hierarchical models when y='check' or y='multi' that the hierarchies are based on (e.g., hospitals nested within health systems).
main: the main title of the plot.
xlab: a character vector label for the x-axis.
ylab: a character vector label for the y-axis.
xlim: specify plot's x-axis limits with a 2 element numeric vector.
ylim: specify plot's y-axis limits with a 2 element numeric vector.
vlim: two element vector to specify limits for minimum and maximum values used to extrapolate posterior lines along the x-axis. For example, when drawing a log-normal distribution, we may want to have our posterior lines fit within a narrower range while having our graph's x-axis limits extend past those lines. If so, the value limits (vlim) help us keep our posterior predictive check lines within desired limits. Default is NULL.
curve: select a curve to display instead of a histogram when y='post'. Default is FALSE.
lwd: select the line width.
breaks: number of breaks in a histogram. Default is 15.
bcol: a single or multiple element character vector to specify the bar or band color(s). When Bayesian estimates and observed values are present, the first colors are for Bayesian estimates while the last colors are observed values. Defaults to, if nothing selected, 'gray', except when y = 'multi' and then no overall HDI is graphed until a color is selected.
lcol: a single or multiple element character vector to specify the line color(s). When Bayesian estimates and observed values are present, the first colors are Bayesian estimates while the last colors are observed values. When multiple lines are needed, single item lines precede multiple use lines. For example, a single comparison value line will be assigned the first lcol while both rope lines will be given the same color of the second lcol when y='post'. Defaults to 'gray' if nothing selected.
pcol: a single or multiple element character vector to specify the point color(s). When Bayesian estimates and observed values are present, the first colors are Bayesian estimates while the last colors are observed values. Defaults to, if nothing selected, 'gray'.
xpt: a numeric vector of single or multiple values that indicate placement of points (+) on the x-axis when y='check'. This is intended for the graphs with predictive checks on Bayesian estimation (i.e., not trend lines). Default is NULL.
tgt: specify 1 or more values on the x- or y-axis of where to add one or more target lines when applicable. Default is NULL.
tgtcol: select one or multiple colors for one or multiple target lines. Default is 'gray'.
tpline: add one or more time point vertical lines using x-axis values. Default is NULL (i.e., no lines).
tpcol: specify a color for the time point line, tpline. Default is NULL.
pline: a numeric vector of length == 1 for the number of random posterior predictive check lines when y='check'. Default is 20.
pct: a numeric integer vector of length == 1 for the percentage of the posterior predictive check heavy tail lines to be drawn when type= 'taov' or 'taov1'. Valid values are 0 < pct < 100. Default is 95 (e.g., 95%).
add.legend: add a legend by selecting the location as "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center". Default is no legend produced if nothing is selected.
legend: a character vector of length >= 1 to appear when y='check', y='multi', and sometimes y='target'. Legends to represent hierarchical estimates and observed values.
cex: A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1.
cex.lab: The magnification to be used for x and y labels relative to the current setting of cex.
cex.axis: The magnification to be used for axis annotation relative to the current setting of cex.
cex.main: The magnification to be used for main titles relative to the current setting of cex.
cex.text: The magnification to be used for the text added into the plot relative to the current setting of 1.
cex.legend: The magnification to be used for the legend added into the plot relative to the current setting of 1.
HDItext: numeric vector of length == 1 that can be a negative or positive value. Identifies placement of HDI text near credible interval when y='post'. Values are relative to the x-axis values. Default is 0.7.
math: mathematics function performed between multiple parameters when y='post'. Available functions are: 'add', 'subtract', 'multiply', 'divide', or 'n' for none (i,e., no functions). Indicate parameters with parameter argument. For example, when math='subtract', use parameter=list('hospital_A', 'hospital_Z') if you want to estimate the difference between the hospital's outcomes. Use parameter=list(c('hospital_A','hospital_B'), ('hospital_Y','hospital_Z')) to estimate how different the combined hospitals A and B values are from the combined hospitals Y and Z. Additionally, compute statistics like the coefficient of variation when math='divide' and parameter= list('Standard_Deviation', 'Mean'). Default is 'n' for no math function.
es: one element vector that indicates which type of likelihood distribution is relevant in calculating Jacob Cohen's effect sizes between 2 parameters when y='post'. Options are 'bern', 'bin', and 'n' for the Bernoulli or binomial distributions for binary outcomes and none (i.e., no distribution, hence no effect size calculated). For example, to get the posterior distribution summary for the difference between the intervention and control groups on 30-day readmissions or not, use es='bern' or 'bin' when y='post', math='subtract', and parameter=list('intMean', 'ctlMean'). Default is 'n' which indicates not to calculate the effect size.
subset: a single or multiple element character or numeric vector of group names that are a subset of the observations to use in the grouping when y='multi'. The default is NULL, thereby using all observations. Specify, for example, enter c('NY', 'Toronto', 'LA', 'Vancouver') to view a graph with only these cities. Default is NULL.
level: a numeric integer of length == 1, either 1, 2, or 3 that indicates the level of the hierarchical/multilevel model when y='multi' and the type of graph to plot. For example, a multilevel model that estimates the proportion of successful exams by patients is considered level=2. And the successful exam rates by patients from various hospitals is level=3. Graphs can be created separately for both level=2 and level=3 when there is a three-level model. The graph when y='multi' can be produced when level=1 for non-hierarchical models if there are estimates for groups. For example, estimating the patient infection rate of hospitals without a hierarchical structure in the model. Default is NULL.
aorder: a logical indicator on whether the ordering of the group levels are in alphabetical order or not when y='multi'. If aorder=TRUE, results are displayed in an increasing alphabetical order based on level name (e.g., 'LA' before 'NY'). If aorder=FALSE, an increasing numeric order based on group parameter values is performed (e.g., 0.65 before 0.70). Default is TRUE.
round.c: an integer indicating the number of decimal places when rounding numbers y='multi' and y='target'. Default is 2.
...: additional arguments.

References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Second Edition. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. ISBN 0-8058-0283-5 Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for Bayesian Regression Models. The American Statistician, 73, 3, 307–309. https://doi.org/10.1080/00031305.2018.1549100

Kruschke, J. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. New York: Academic Press. ISBN: 9780124058880

Examples

Run this code

# Posterior estimates of length of stay #
# Set up the Bayes object first, convert the chains
blos1 <- Bayes(losmcmc, newdata=TRUE)

#' # Examine Bayesian model diagnostics for estimated length of stay (LOS) #
# Review traceplots, density plots, autocorrelation factor, and Gelman-Rubin statistic
plot(blos1, y="dxt", parameter="muOfY")
plot(blos1, y="dxd", parameter="muOfY")
plot(blos1, y="dxa", parameter="muOfY")
plot(blos1, y="dxg", parameter="muOfY")

# And we can calculate statistics that combine multiple parameters (e.g., calculate
# the mean difference between intervention and control groups). Here we get the
# coefficient of variation by dividing the standard deviation by the mean. We use
# the 'math' argument and arrange our parameters in the proper order.
# We only need the first 4 arguments but we'll add a little more.
plot(x=blos1, y="post", parameter=list("sigmaOfY", "muOfY" ),math="divide",
bcol="cyan", HDItext=.3, main= "Coefficient of Variation")

# Posterior Predictive Checks on how well our model fits the data #
# Estimating center and spread for hospital length of stay
# Generally, we only need the first 6 arguments but we'll modify the plot.
# A model with a gamma likelihood would fit better but this is ok for now.
plot(x=blos1, y="check", type="n", data=hosprog, dv="los",
parameter=c("muOfY", "sigmaOfY"), breaks=30, cex.axis=1.3, lwd=3, xlab=NULL,
pline=20, vlim=c(-2, 20), xlim=c(-2, 20), add.legend="topright",
main="Length of Stay", cex.main=1.5, xpt=5, pcol="red", lcol="orange",
cex.legend=1, bcol="cyan")

# Estimating the regression trend line
# Now lets look at the trend of conc on CO2 uptake from the CO2 data.
# Using a quadratic model with conc^2 would help and an option in ham.
# First, create the Bayes object
bco2 <- Bayes(x=co2mcmc, newdata=TRUE )
# We generally only need the first 7 arguments below.
plot(x=bco2, y="check", type="ol", data=CO2, dv="uptake", iv="conc",
parameter=c("b0","b1"), add.data="al", cex.axis=1.3, lwd=1.5, pline=50,
vlim=c(50, 1100), xlim=c(0, 1100), ylim=c(0, 50), cex=2, cex.lab=2,
pcol="magenta", cex.main=2,cex.legend=1.2,  add.legend="topleft",
lcol="steelblue")              #vlim lets me extrapolate a little

# Hierarchical or Multilevel Model Summary #
# We generally only need the first 3 arguments below. But we'll subset
# on 8 of 12 plants in the level 2 model (observations nested in plants)
# and modify other settings.
plot(x=co2multi, y="multi", level=2, aorder=FALSE,
subset= c("Qn2","Qn3","Qc3","Qc2","Mn3","Mn2","Mc2","Mc3"),
lcol="blue", pcol= c("red", "skyblue"), round.c=1, bcol="yellow",
xlim=c(-.1, 1), legend=NULL, add.legend="topright", lwd=3, cex.lab=1.2,
cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL)
# And now the level 3 plot (observation in plants in Treatment by type groups)
plot(x=co2multi, y="multi", level=3, aorder=FALSE, lcol="blue", pcol= c("green", "pink"),
round.c=1, bcol="lavender", xlim=c(-.1, 1), legend=NULL, add.legend="right", lwd=3,
cex.lab =1.2, cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL)

# Targets for length of stay (LOS) #
# Our administrators ask how far are we from our goals, they ask about targets
# in increments of 5 points of probability or specific days. We answer both.
btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"),
newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55),  y=c(3,4))) # 'newdata' for plots
# Target graph using the standard option, overlaying estimate of mode parameter
plot(x=btarget1, y="target", type="n", lcol="purple", tgtcol="blue", xlim=c(3.5, 5))
# Target graph using a posterior predictive check, more intuitive
plot(x=btarget1, y="target", type="n", data=hosprog, dv="los", breaks=30,
cex.axis=1.3, lwd=3, pline=20, vlim=c(-1, 12), xlim=c(-1, 10),
parameter=c("muOfY","sigmaOfY"), add.legend="topright", main="Length of Stay",
cex.main=1.5, xpt=5, pcol="black", lcol="cyan", tgtcol="blue", bcol="orange",
cex.legend=1.5, cex.text = 2)

Run the code above in your browser using DataLab