# PPC-overview

##### Graphical posterior predictive checking

The bayesplot PPC module provides various plotting functions for creating graphical displays comparing observed data to simulated data from the posterior predictive distribution. See below for a brief discussion of the ideas behind posterior predictive checking, a description of the structure of this package, and tips on providing an interface to bayesplot from another package.

##### Details

The idea behind posterior predictive checking is simple: if a model is a good fit then we should be able to use it to generate data that looks a lot like the data we observed.

### Posterior predictive distribution

To generate the data used for posterior predictive checks we simulate from
the *posterior predictive distribution*. The posterior predictive
distribution is the distribution of the outcome variable implied by a model
after using the observed data \(y\) (a vector of outcome values), and
typically predictors \(X\), to update our beliefs about the unknown
parameters \(\theta\) in the model. For each draw of the parameters
\(\theta\) from the posterior distribution \(p(\theta \,|\, y,
X)\) we generate an entire vector of outcomes. The result is
an \(S \times N\) matrix of simulations, where \(S\) is the the
size of the posterior sample (number of draws from the posterior
distribution) and \(N\) is the number of data points in \(y\). That is,
each row of the matrix is an individual "replicated" dataset of \(N\)
observations.

### Notation

When simulating from the posterior predictive distribution we can use either
the same values of the predictors \(X\) that we used when fitting the model
or new observations of those predictors. When we use the same values of
\(X\) we denote the resulting simulations by \(y^{rep}\) as they
can be thought of as *replications* of the outcome \(y\) rather than
predictions for future observations. This corresponds to the notation from
Gelman et. al. (2013) and is the notation used throughout the documentation
for this package.

### Graphical posterior predictive checking

Using the datasets \(y^{rep}\) drawn from the posterior predictive distribution, the functions in the bayesplot package produce various graphical displays comparing the observed data \(y\) to the replications. For a more thorough discussion of posterior predictive checking see Chapter 6 of Gelman et. al. (2013).

##### PPC plotting functions

The plotting functions for posterior predictive checking in this package are organized into several categories, each with its own documentation:

**Distributions**Histograms, kernel density estimates, boxplots, and other plots comparing the empirical distribution of the observed data

`y`

to the distributions of individual replicated datasets (rows) in`yrep`

.**Test statistics**The distribution of a test statistic, or a pair of test statistics, over the replicated datasets (rows) in

`yrep`

compared to value of the statistic(s) computed from`y`

.**Intervals**Interval estimates of

`yrep`

with`y`

overlaid. The x-axis variable can be optionally specified by the user (e.g. to plot against against a predictor variable or over time).**Predictive errors**Plots of predictive errors (

`y - yrep`

) computed from`y`

and replicated datasets (rows) in`yrep`

. For binomial models binned error plots are also available.**Scatterplots**Scatterplots (and similar visualizations) of the observed data

`y`

vs. individual replicated datasets (rows) in`yrep`

, or vs. the average value of the distributions of each data point (columns) in`yrep`

.**Plots for discrete outcomes**PPC functions that can only be used if

`y`

and`yrep`

are discrete. For example, rootograms for count outcomes and bar plots for ordinal, categorical, and multinomial outcomes.**LOO predictive checks**PPC functions for predictive checks based on (approximate) leave-one-out (LOO) cross-validation.

##### Providing an interface for posterior predictive checking from another package

In addition to the various plotting functions, the bayesplot package
provides the S3 generic `pp_check`

. Authors of R packages for
Bayesian inference are encouraged to define `pp_check`

methods for the
fitted model objects created by their packages. See the package vignettes for
more details and a simple example, and see the rstanarm and brms
packages for full examples of `pp_check`

methods.

##### References

Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman,
A. (2018). Visualization in Bayesian workflow. *Journal of the Royal
Statistical Society Series A*, accepted for publication. arXiv preprint:
http://arxiv.org/abs/1709.01449.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari,
A., and Rubin, D. B. (2013). *Bayesian Data Analysis.* Chapman & Hall/CRC
Press, London, third edition. (Ch. 6)

##### See Also

Other PPCs: `PPC-discrete`

,
`PPC-distributions`

,
`PPC-errors`

, `PPC-intervals`

,
`PPC-loo`

, `PPC-scatterplots`

,
`PPC-test-statistics`

*Documentation reproduced from package bayesplot, version 1.6.0, License: GPL (>= 3)*