compare_bayesm_by_predictor: Compare Bayesian Models by Predictor Using Posterior Predictive Simulations

Description

This function compares posterior predictive distributions from several Bayesian models across levels of a selected predictor variable. For numeric predictors, the variable is binned; for categorical predictors, the original factor levels are used directly. The function visualizes the distribution of simulated means and standard deviations per predictor level alongside the observed values.

Usage

compare_bayesm_by_predictor(
  data,
  models,
  parameters = NULL,
  var.plot,
  intercept = NULL,
  ypredict = NULL,
  outcome,
  mbreaks
)

Value

A list containing:

models_summary: A data frame summarizing the posterior predictive means and standard deviations per draw, model, and predictor level (or bin).
p_mean: A ggplot showing the distribution of simulated outcome means for each bin and model, overlaid with the observed means and sample size per bin.
p_sd: A ggplot showing the distribution of simulated outcome standard deviations for each bin and model, overlaid with the observed standard deviations and sample size per bin.

The function also prints the plots side by side using ggarrange().

Arguments

data: A data frame containing the original dataset.
models: A named list of fitted stanfit model objects. Each name will be used as the model label.
parameters: Optional. A named list mapping each model to a named character vector where each name is a variable in the data and the value is the name of the corresponding parameter/coefficient in the model. Must have the same names as models. Required if ypredict is not provided.
var.plot: A single character string. Name of the predictor variable in data used for binning and plotting. Must be in all the models.
intercept: Optional. A named list with the intercept parameter names for each model. Each entry should be a character string or NULL if no intercept is used. Must have the same names as models.
ypredict: Optional. A named list of posterior predictive matrices. Each matrix should have rows as posterior draws and columns as data points. If not provided, predictions are computed internally. Must have the same names as models.
outcome: A character string. The name of the outcome variable in data.
mbreaks: Number of bins if var.plot is numeric; ignored if it's a factor.

Details

This function provides a visual diagnostic for comparing posterior predictive summaries across multiple Bayesian models. The predictor variable can be either continuous or categorical. For continuous variables, the range is divided into bins using cut() and mbreaks; for categorical variables, no binning is applied. Posterior predictive distributions are either precomputed via ypredict or generated internally using parameters and (optionally) intercept.