add_predicted_samples: Add samples from the posterior fit or posterior prediction of a model to a data frame

Description

Given a data frame and a model, adds samples from the posterior fit (aka the linear/link-level predictor) or the posterior predictions of the model to the data frame in a long format.

Usage

add_predicted_samples(newdata, model, var = "pred", ..., n = NULL,
  re_formula = NULL)
add_fitted_samples(newdata, model, var = "estimate", ..., n = NULL,
  re_formula = NULL, category = "category", auxpars = TRUE,
  scale = c("response", "linear"))
predicted_samples(model, newdata, var = "pred", ..., n = NULL,
  re_formula = NULL)
fitted_samples(model, newdata, var = "estimate", ..., n = NULL,
  re_formula = NULL, category = "category", auxpars = TRUE,
  scale = c("response", "linear"))
# S3 method for default
predicted_samples(model, newdata, ...)
# S3 method for default
fitted_samples(model, newdata, ...)
# S3 method for stanreg
predicted_samples(model, newdata, var = "pred", ...,
  n = NULL, re_formula = NULL)
# S3 method for stanreg
fitted_samples(model, newdata, var = "estimate", ...,
  n = NULL, re_formula = NULL, category = "category", auxpars = TRUE,
  scale = c("response", "linear"))
# S3 method for brmsfit
predicted_samples(model, newdata, var = "pred", ...,
  n = NULL, re_formula = NULL)
# S3 method for brmsfit
fitted_samples(model, newdata, var = "estimate", ...,
  n = NULL, re_formula = NULL, category = "category", auxpars = TRUE,
  scale = c("response", "linear"))

Arguments

newdata

Data frame to generate predictions from. If omitted, most model types will generate predictions from the data used to fit the model.

model

A supported Bayesian model fit / MCMC object that can provide fits and predictions. Supported models are listed in the second section of tidybayes-models: Models Supporting Prediction. While other functions in this package (like spread_samples) support a wider range of models, to work with add_fitted_samples and add_predicted_samples a model must provide an interface for generating predictions, thus more generic Bayesian modeling interfaces like runjags and rstan are not directly supported for these functions (only wrappers around those languages that provide predictions, like rstanarm and brm, are supported here).

var

The name of the output column for the predictions (default "pred") or fits (default "estimate", for compatibility with tidy).

...

Additional arguments passed to the underlying prediction method for the type of model given.

The number of samples per prediction / fit to return.

re_formula

formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects; if NA, include no group-level effects. Some model types (such as brm and stanreg-objects) allow marginalizing over grouping factors by specifying new levels of a factor in newdata. In the case of brm, you must also pass allow_new_levels = TRUE here to include new levels (see predict.brmsfit).

category

For some ordinal and multinomial models (notably, brm models but not stan_polr models), multiple sets of rows will be returned per estimate for fitted_samples, one for each category. The category argument specifies the name of the column to put the category names into in the resulting data frame. The fact that multiple rows per response are returned only for some model types reflects the fact that tidybayes takes the approach of tidying whatever output is given to us, and the output from different modeling functions differ on this point. See vignette("tidy-brms") and vignette("tidy-rstanarm") for examples of dealing with output from ordinal models using both approaches.

auxpars

For fitted_samples and add_fitted_samples: Should auxiliary parameters be included in the output? Valid only for models that support auxiliary parameters, (such as submodels for variance parameters as in brm). If TRUE, auxiliary parameters are included in the output as additional columns named after each parameter (alternative names can be provided using a list or named vector, e.g. c(sigma.hat = "sigma") would output the "sigma" parameter from a model as a column named "sigma.hat"). If FALSE, auxiliary parameters are not included.

scale

Either "response" or "linear". If "response", results are returned on the scale of the response variable. If "linear", fitted values are returned on the scale of the linear predictor.

Value

A data frame (actually, a tibble) with a .row column (a factor grouping rows from the input newdata), .chain column (the chain each sample came from, or NA if the model does not provide chain information), .iteration column (the iteration the sample came from), and .pred column (a sample from the posterior predictive distribution). For convenience, the resulting data frame comes grouped by the original input rows.

Details

add_fitted_samples adds samples from the posterior linear predictor (or the "link") to the data. It corresponds to posterior_linpred in rstanarm or fitted.brmsfit in brms.

add_predicted_samples adds samples from the posterior prediction to the data. It corresponds to posterior_predict in rstanarm or predict.brmsfit in brms.

add_fitted_samples and fitted_samples are alternate spellings of the same function with opposite order of the first two arguments to facilitate use in data processing pipelines that start either with a data frame or a model. Similarly, add_predicted_samples and predicted_samples are alternate spellings.

Given equal choice between the two, add_fitted_samples and add_predicted_samples are the preferred spellings.

Examples

Run this code

# NOT RUN {
library(ggplot2)
library(dplyr)
library(rstanarm)
library(modelr)

theme_set(theme_light())

m_mpg = stan_glm(mpg ~ hp * cyl, data = mtcars,
  # 1 chain / few iterations just so example runs quickly
  # do not use in practice
  chains = 1, iter = 500)

# sample 100 fit lines from the posterior and overplot them
mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 101)) %>%
  add_fitted_samples(m_mpg, n = 100) %>%
  ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
  geom_line(aes(y = estimate, group = paste(cyl, .iteration)), alpha = 0.25) +
  geom_point(data = mtcars)

# plot posterior predictive intervals
mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 101)) %>%
  add_predicted_samples(m_mpg) %>%
  ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
  stat_lineribbon(aes(y = pred), .prob = c(.99, .95, .8, .5), alpha = 0.25) +
  geom_point(data = mtcars) +
  scale_fill_brewer(palette = "Greys")

# }