
Last chance! 50% off unlimited learning
Sale ends in
rvar
s for the linear predictor, posterior expectation, posterior predictive, or residuals of a model to a data frameGiven a data frame and a model, adds rvar
s of draws from the linear/link-level predictor,
the expectation of the posterior predictive, or the posterior predictive to
the data frame.
add_epred_rvars(
newdata,
object,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for default
epred_rvars(
object,
newdata,
...,
value = ".epred",
seed = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for stanreg
epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for brmsfit
epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
add_linpred_rvars(
newdata,
object,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for default
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
seed = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for stanreg
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
# S3 method for brmsfit
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
add_predicted_rvars(
newdata,
object,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
# S3 method for default
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
seed = NULL,
columns_to = NULL
)
# S3 method for stanreg
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
# S3 method for brmsfit
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
A data frame (actually, a tibble) equal to the input newdata
with
additional columns added containing rvar
s representing the requested predictions or fits.
Data frame to generate predictions from.
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like spread_rvars()
) support a wider range of models, to work with
add_epred_rvars()
, add_predicted_rvars()
, etc. a model must provide an interface for generating
predictions, thus more generic Bayesian modeling interfaces like runjags
and rstan
are not directly
supported for these functions (only wrappers around those languages that provide predictions, like rstanarm
and brm
, are supported here).
Additional arguments passed to the underlying prediction method for the type of model given.
The name of the output column:
for [add_]epred_rvars()
, defaults to ".epred"
.
for [add_]predicted_rvars()
, defaults to ".prediction"
.
for [add_]linpred_rvars()
, defaults to ".linpred"
.
The number of draws to return, or NULL
to return all draws.
A seed to use when subsampling draws (i.e. when ndraws
is not NULL
).
formula containing group-level effects to be considered in the prediction.
If NULL
(default), include all group-level effects; if NA
, include no group-level effects.
Some model types (such as brms::brmsfit and rstanarm::stanreg-objects) allow
marginalizing over grouping factors by specifying new levels of a factor in newdata
. In the case of
brms::brm()
, you must also pass allow_new_levels = TRUE
here to include new levels (see
brms::posterior_predict()
).
For add_epred_rvars()
and add_linpred_rvars()
: Should distributional regression
parameters be included in the output? Valid only for models that support distributional regression parameters,
such as submodels for variance parameters (as in brms::brm()
). If TRUE
, distributional regression
parameters are included in the output as additional columns named after each parameter
(alternative names can be provided using a list or named vector, e.g. c(sigma.hat = "sigma")
would output the "sigma"
parameter from a model as a column named "sigma.hat"
).
If NULL
or FALSE
(the default), distributional regression parameters are not included.
For some models, such as ordinal, multinomial, and multivariate models (notably, brms::brm()
models but
not rstanarm::stan_polr()
models), the column of predictions in the resulting data frame may include nested columns.
For example, for ordinal/multinomial models, these columns correspond to different categories of the response variable.
It may be more convenient to turn these nested columns into rows in the output; if this is desired, set
columns_to
to a string representing the name of a column you would like the column names to be placed in.
In this case, a .row
column will also be added to the result indicating which rows of the output
correspond to the same row in newdata
.
See vignette("tidy-posterior")
for examples of dealing with output ordinal models.
Matthew Kay
Consider a model like:
This model has:
an outcome variable,
a response distribution,
a single predictor,
coefficients
We fit this model to some observed data, newdata
, the functions for posterior draws are
defined as follows:
add_predicted_rvars()
adds rvar
s containing draws from the posterior predictive distribution,
rstanarm::posterior_predict()
or brms::posterior_predict()
.
add_epred_rvars()
adds rvar
s containing draws from the expectation of the posterior predictive
distribution, aka the conditional expectation,
rstanarm::posterior_epred()
or brms::posterior_epred()
.
Not all models support this function.
add_linpred_rvars()
adds rvar
s containing draws from the posterior linear predictors to the data.
It corresponds to rstanarm::posterior_linpred()
or brms::posterior_linpred()
.
Depending on the model type and additional parameters passed, this may be:
The untransformed linear predictor, e.g.
add_linpred_rvars(transform = FALSE)
for brms and rstanarm models.
It is analogous to type = "link"
in predict.glm()
.
The inverse-link transformed linear predictor, e.g.
add_linpred_rvars(transform = TRUE)
for brms and rstanarm models.
It is analogous to type = "response"
in predict.glm()
.
NOTE: add_linpred_rvars(transform = TRUE)
and add_epred_rvars()
may be equivalent but
are not guaranteed to be. They are equivalent when the expectation of the response
distribution is equal to its first parameter, i.e. when add_epred_rvars()
if available, and if not available, verify this property holds prior
to using add_linpred_rvars()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_rvars(newdata, object)
versus predicted_rvars(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
add_predicted_draws()
for the analogous functions that use a long-data-frame-of-draws
format instead of a data-frame-of-rvar
s format. See spread_rvars()
for manipulating posteriors directly.
if (FALSE) {
library(ggplot2)
library(dplyr)
library(posterior)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars, family = lognormal(),
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# Look at mean predictions for some cars (epred) and compare to
# the exponeniated mu parameter of the lognormal distribution (linpred).
# Notice how they are NOT the same. This is because exp(mu) for a
# lognormal distribution is equal to its median, not its mean.
mtcars %>%
select(hp, cyl, mpg) %>%
add_epred_rvars(m_mpg) %>%
add_linpred_rvars(m_mpg, value = "mu") %>%
mutate(expmu = exp(mu), .epred - expmu)
# plot intervals around conditional means (epred_rvars)
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_epred_rvars(m_mpg) %>%
ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
stat_lineribbon(aes(dist = .epred), .width = c(.95, .8, .5), alpha = 1/3) +
geom_point(aes(y = mpg), data = mtcars) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Set2")
# plot posterior predictive intervals (predicted_rvars)
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_predicted_rvars(m_mpg) %>%
ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
stat_lineribbon(aes(dist = .prediction), .width = c(.95, .8, .5), alpha = 1/3) +
geom_point(aes(y = mpg), data = mtcars) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Set2")
}
Run the code above in your browser using DataLab