posterior_predict.stanreg
Draw from posterior predictive distribution
The posterior predictive distribution is the distribution of the outcome implied by the model after using the observed data to update our beliefs about the unknown parameters in the model. Simulating data from the posterior predictive distribution using the observed predictors is useful for checking the fit of the model. Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s). With new observations of predictor variables we can use the posterior predictive distribution to generate predicted outcomes.
Usage
"posterior_predict"(object, newdata = NULL, draws = NULL, re.form = NULL, fun = NULL, seed = NULL, offset = NULL, ...)
Arguments
 object
 A fitted model object returned by one of the
rstanarm modeling functions. See
stanregobjects
.  newdata
 Optionally, a data frame in which to look for variables with
which to predict. If omitted, the model matrix is used. If
newdata
is provided and any variables were transformed (e.g. rescaled) in the data used to fit the model, then these variables must also be transformed innewdata
. This only applies if variables were transformed before passing the data to one of the modeling functions and not if transformations were specified inside the model formula. Also see the Note section below for a note about using thenewdata
argument with with binomial models.  draws
 An integer indicating the number of draws to return. The default and maximum number of draws is the size of the posterior sample.
 re.form
 If
object
containsgrouplevel
parameters, a formula indicating which grouplevel parameters to condition on when making predictions.re.form
is specified in the same form as forpredict.merMod
. The default,NULL
, indicates that all estimated grouplevel parameters are conditioned on. To refrain from conditioning on any grouplevel parameters, specifyNA
or~0
. Thenewdata
argument may include new levels of the grouping factors that were specified when the model was estimated, in which case the resulting posterior predictions marginalize over the relevant variables.  fun
 An optional function to apply to the results.
fun
is found by a call tomatch.fun
and so can be specified as a function object, a string naming a function, etc.  seed
 An optional
seed
to use.  offset
 A vector of offsets. Only required if
newdata
is specified and anoffset
argument was specified when fitting the model.  ...
 Currently ignored.
Value

A
draws
by nrow(newdata)
matrix of simulations from the
posterior predictive distribution. Each row of the matrix is a vector of
predictions generated using a single draw of the model parameters from the
posterior distribution. The returned matrix will also have class
"ppd"
to indicate it contains draws from the posterior predictive
distribution.
Note
For binomial models with a number of trials greater than one (i.e., not
Bernoulli models), if newdata
is specified then it must include all
variables needed for computing the number of binomial trials to use for the
predictions. For example if the lefthand side of the model formula is
cbind(successes, failures)
then both successes
and
failures
must be in newdata
. The particular values of
successes
and failures
in newdata
do not matter so
long as their sum is the desired number of trials. If the lefthand side of
the model formula were cbind(successes, trials  successes)
then
both trials
and successes
would need to be in newdata
,
probably with successes
set to 0
and trials
specifying
the number of trials. See the Examples section below and the
How to Use the rstanarm Package for examples.
See Also
pp_check
for graphical posterior predictive checks.
Examples of posterior predictive checking can also be found in the
rstanarm vignettes and demos.
Examples
if (!exists("example_model")) example(example_model)
yrep < posterior_predict(example_model)
table(yrep)
# Using newdata
counts < c(18,17,15,20,10,20,25,13,12)
outcome < gl(3,1,9)
treatment < gl(3,3)
fit3 < stan_glm(counts ~ outcome + treatment, family = poisson(link="log"),
prior = normal(0, 1), prior_intercept = normal(0, 5))
nd < data.frame(treatment = factor(rep(1,3)), outcome = factor(1:3))
ytilde < posterior_predict(fit3, nd, draws = 500)
print(dim(ytilde)) # 500 by 3 matrix (draws by nrow(nd))
ytilde < data.frame(count = c(ytilde),
outcome = rep(nd$outcome, each = 500))
ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) +
ggplot2::geom_boxplot() +
ggplot2::ylab("predicted count")
# Using newdata with a binomial model.
# example_model is binomial so we need to set
# the number of trials to use for prediction.
# This could be a different number for each
# row of newdata or the same for all rows.
# Here we'll use the same value for all.
nd < lme4::cbpp
print(formula(example_model)) # cbind(incidence, size  incidence) ~ ...
nd$size < max(nd$size) + 1L # number of trials
nd$incidence < 0 # set to 0 so size  incidence = number of trials
ytilde < posterior_predict(example_model, newdata = nd)
# Using fun argument to transform predictions
mtcars2 < mtcars
mtcars2$log_mpg < log(mtcars2$mpg)
fit < stan_glm(log_mpg ~ wt, data = mtcars2)
ytilde < posterior_predict(fit, fun = exp)