posterior_predict: Draw from posterior predictive distribution

Description

The posterior predictive distribution is the distribution of the outcome implied by the model after using the observed data to update our beliefs about the unknown parameters in the model. Simulating data from the posterior predictive distribution using the observed predictors is useful for checking the fit of the model. Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s). With new observations of predictor variables we can use posterior predictive distribution to generate predicted outcomes.

Usage

posterior_predict(object, newdata = NULL, draws = NULL, re.form = NULL,
  fun = NULL, seed = NULL, ...)

Arguments

object

A fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects.

newdata

Optionally, a data frame in which to look for variables with which to predict. If omitted, the model matrix is used. If newdata is provided and any variables were transformed (e.g. rescaled) in the data used to fit the model, then these va

draws

An integer indicating the number of draws to return. The default and maximum number of draws is the size of the posterior sample.

re.form

If object contains group-level parameters, a formula indicating which group-level parameters to condition on when making predictions. re.form is specified in the s

fun

An optional function to apply to the results. fun is found by a call to match.fun and so can be specified as a function object, a string naming a function, etc.

seed

An optional seed to use.

...

Currently unused.

Value

A draws by nrow(newdata) matrix of simulations from the posterior predictive distribution. Each row of the matrix is a vector of predictions generated using a single draw of the model parameters from the posterior distribution.

Examples

Run this code

yrep <- posterior_predict(example_model)
table(yrep)

# Using newdata
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
fit3 <- stan_glm(counts ~ outcome + treatment, family = poisson(link="log"),
                prior = normal(0, 1), prior_intercept = normal(0, 5))
nd <- data.frame(treatment = factor(rep(1,3)), outcome = factor(1:3))
ytilde <- posterior_predict(fit3, nd, draws = 500)
print(dim(ytilde))  # 500 by 3 matrix (draws by nrow(nd))
ytilde <- data.frame(count = c(ytilde), 
                     outcome = rep(nd$outcome, each = 500))
ggplot(ytilde, aes(x=outcome, y=count)) + 
  geom_boxplot() + 
  ylab("predicted count")


# Using newdata with a binomial model
# example_model is binomial so we need to set
# the number of trials to use for prediction.
# This could be a different number for each 
# row of newdata or the same for all rows.
# Here we'll use the same value for all.
nd <- lme4::cbpp
print(formula(example_model))  # cbind(incidence, size - incidence) ~ ...
nd$size <- max(nd$size) + 1L   # number of trials
nd$incidence <- 0  # set to 0 so size - incidence = number of trials
ytilde <- posterior_predict(example_model, newdata = nd)


# Using fun argument to transform predictions
mtcars2 <- mtcars
mtcars2$log_mpg <- log(mtcars2$mpg)
fit <- stan_glm(log_mpg ~ wt, data = mtcars2)
ytilde <- posterior_predict(fit, fun = exp)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples