Supported Models
A list of supported models can be found at the package website.
Support for models varies by function, i.e. although ggpredict()
,
ggemmeans()
and ggeffect()
support most models, some models
are only supported exclusively by one of the three functions.
Difference between ggpredict()
and ggeffect()
or ggemmeans()
ggpredict()
calls predict()
, while ggeffect()
calls effects::Effect()
and ggemmeans()
calls emmeans::emmeans()
to compute predicted values.
Thus, effects returned by ggpredict()
can be described as conditional effects
(i.e. these are conditioned on certain (reference) levels of factors), while
ggemmeans()
and ggeffect()
return marginal means, since
the effects are "marginalized" (or "averaged") over the levels of factors
(or values of character vectors). Therefore, ggpredict()
and ggeffect()
resp. ggemmeans()
differ in how factors and character vectors are held
constant: ggpredict()
uses the reference level (or "lowest" value in case
of character vectors), while ggeffect()
and ggemmeans()
compute a
kind of "average" value, which represents the proportions of each factor's
category. Use condition
to set a specific level for factors in
ggemmeans()
, so factors are not averaged over their categories,
but held constant at a given level.
Marginal Effects and Adjusted Predictions at Specific Values
Meaningful values of focal terms can be specified via the terms
argument.
Specifying meaningful or representative values as string pattern is the
preferred way in the ggeffects package. However, it is also possible to
use a list()
for the focal terms if prefer the "classical" R way, which is
described in this vignette.
Indicating levels in square brackets allows for selecting only certain
groups or values resp. value ranges. The term name and the start of the
levels in brackets must be separated by a whitespace character, e.g.
terms = c("age", "education [1,3]")
. Numeric ranges, separated with colon,
are also allowed: terms = c("education", "age [30:60]")
. The stepsize for
ranges can be adjusted using by
, e.g. terms = "age [30:60 by=5]"
.
The terms
argument also supports the same shortcuts as the values
argument
in values_at()
. So terms = "age [meansd]"
would return predictions for
the values one standard deviation below the mean age, the mean age and one SD
above the mean age. terms = "age [quart2]"
would calculate predictions at
the value of the lower, median and upper quartile of age.
Furthermore, it is possible to specify a function name. Values for predictions
will then be transformed, e.g. terms = "income [exp]"
. This is useful when
model predictors were transformed for fitting the model and should be
back-transformed to the original scale for predictions. It is also possible
to define own functions (see
this vignette).
Instead of a function, it is also possible to define the name of a variable
with specific values, e.g. to define a vector v = c(1000, 2000, 3000)
and
then use terms = "income [v]"
.
You can take a random sample of any size with sample=n
, e.g
terms = "income [sample=8]"
, which will sample eight values from
all possible values of the variable income
. This option is especially
useful for plotting predictions at certain levels of random effects
group levels, where the group factor has many levels that can be completely
plotted. For more details, see
this vignette.
Finally, numeric vectors for which no specific values are given, a "pretty range"
is calculated (see pretty_range()
), to avoid memory allocation problems
for vectors with many unique values. If a numeric vector is specified as
second or third term (i.e. if this vector represents a grouping structure),
representative values (see values_at()
) are chosen (unless other values
are specified). If all values for a numeric vector should be used to compute
predictions, you may use e.g. terms = "age [all]"
. See also package vignettes.
To create a pretty range that should be smaller or larger than the default
range (i.e. if no specific values would be given), use the n
tag, e.g.
terms="age [n=5]"
or terms="age [n=12]"
. Larger values for n
return a
larger range of predicted values.
Holding covariates at constant values
For ggpredict()
, a data grid is constructed, roughly comparable to
expand.grid()
on all unique combinations of model.frame(model)[, terms]
.
This data grid (see data_grid()
) as newdata
argument for predict()
.
In this case, all remaining covariates that are not specified in terms
are
held constant: Numeric values are set to the mean (unless changed with
the condition
or typical
argument), integer values are set to their
median, factors are set to their reference level (may also be changed with
condition
) and character vectors to their mode (most common element).
ggeffect()
and ggemmeans()
, by default, set remaining numeric covariates
to their mean value, while for factors, a kind of "average" value, which
represents the proportions of each factor's category, is used. The same
applies to character vectors: ggemmeans()
averages over the distribution
of unique values in a character vector, similar to how factors are treated.
For ggemmeans()
, use condition
to set a specific level for
factors so that these are not averaged over their categories, but held
constant at the given level.
Bayesian Regression Models
ggpredict()
also works with Stan-models from the rstanarm or
brms-packages. The predicted values are the median value of all drawn
posterior samples. The confidence intervals for Stan-models are Bayesian
predictive intervals. By default (i.e. ppd = FALSE
), the predictions are
based on rstantools::posterior_linpred()
and hence have some limitations:
the uncertainty of the error term is not taken into account. The recommendation
is to use the posterior predictive distribution (rstantools::posterior_predict()
).
Zero-Inflated and Zero-Inflated Mixed Models with brms
Models of class brmsfit
always condition on the zero-inflation component,
if the model has such a component. Hence, there is no type = "zero_inflated"
nor type = "zi_random"
for brmsfit
-models, because predictions are based
on draws of the posterior distribution, which already account for the
zero-inflation part of the model.
Zero-Inflated and Zero-Inflated Mixed Models with glmmTMB
If model
is of class glmmTMB
, hurdle
, zeroinfl
or zerotrunc
,
simulations from a multivariate normal distribution (see ?MASS::mvrnorm
)
are drawn to calculate mu*(1-p)
. Confidence intervals are then based on
quantiles of these results. For type = "zi_random"
, prediction intervals
also take the uncertainty in the random-effect paramters into account (see
also Brooks et al. 2017, pp.391-392 for details).
An alternative for models fitted with glmmTMB that take all model
uncertainties into account are simulations based on simulate()
, which
is used when type = "sim"
(see Brooks et al. 2017, pp.392-393 for
details).
MixMod-models from GLMMadaptive
Predicted values for the fixed effects component (type = "fixed"
or
type = "zero_inflated"
) are based on predict(..., type = "mean_subject")
,
while predicted values for random effects components (type = "random"
or
type = "zi_random"
) are calculated with predict(..., type = "subject_specific")
(see ?GLMMadaptive::predict.MixMod
for details). The latter option
requires the response variable to be defined in the newdata
-argument
of predict()
, which will be set to its typical value (see
values_at()
).