ggaverage: Get marginal effects from model terms

Description

ggpredict() computes predicted (fitted) values for the response, at the margin of specific values from certain model terms, where additional model terms indicate the grouping structure. ggaverage() computes the average predicted values. The result is returned as tidy data frame.

mem() is an alias for ggpredict() (marginal effects at the mean), ame() is an alias for ggaverage() (average marginal effects).

Usage

ggaverage(model, terms, ci.lvl = 0.95, type = c("fe", "re"),
  typical = c("mean", "median"), ...)
ame(model, terms, ci.lvl = 0.95, type = c("fe", "re"), typical = c("mean",
  "median"), ...)
ggpredict(model, terms, ci.lvl = 0.95, type = c("fe", "re"),
  full.data = FALSE, typical = c("mean", "median"), ...)
mem(model, terms, ci.lvl = 0.95, type = c("fe", "re"), full.data = FALSE,
  typical = c("mean", "median"), ...)

Arguments

model

A fitted model object, or a list of model objects. Any model that supports common methods like predict(), family() or model.frame() should work.

terms

Character vector with the names of those terms from model, for which marginal effects should be displayed. At least one term is required to calculate effects, maximum length is three terms, where the second and third term indicate the groups, i.e. predictions of first term are grouped by the levels of the second (and third) term. Indicating levels in square brackets allows for selecting only specific groups. Term name and levels in brackets must be separated by a whitespace character, e.g. terms = c("age", "education [1,3]"). See 'Examples'. All remaining covariates that are not specified in terms are held constant (if full.data = FALSE, the default) or are set to the values from the observations (i.e. are kept as they happen to be; see 'Details').

ci.lvl

Numeric, the level of the confidence intervals. For ggpredict(), use ci.lvl = NA, if confidence intervals should not be calculated (for instance, due to computation time).

type

Character, only applies for mixed effects models. Indicates whether predicted values should be conditioned on random effects (type = "re") or fixed effects only (type = "fe", the default).

typical

Character vector, naming the function to be applied to the covariates over which the effect is "averaged". The default is "mean".

...

Further arguments passed down to predict().

full.data

Logical, if TRUE, the returned data frame contains predictions for all observations. This data frame also has columns for residuals and observed values, and can also be used to plot a scatter plot of all data points or fitted values. If FALSE (the default), the returned data frame only contains predictions for all combinations of unique values of the model's predictors. Residuals and observed values are set to NA. Usually, this argument is only used internally by ggaverage().

Value

A tibble (with ggeffects class attribute) with consistent data columns:

x: the values of the first term in terms, used as x-position in plots.
predicted: the predicted values, used as y-position in plots.
conf.low: the lower bound of the confidence interval for the predicted values.
conf.high: the upper bound of the confidence interval for the predicted values.
observed: if full.data = TRUE, this columns contains the observed values (the response vector).
residuals: if full.data = TRUE, this columns contains residuals.
group: the grouping level from the second term in terms, used as grouping-aesthetics in plots.
facet: the grouping level from the third term in terms, used to indicate facets in plots.

Details

Currently supported model-objects are: lm, glm, lme, lmer, glmer, glmer.nb, nlmer, glmTMB, gam, vgam, gamm, gamm4, gls, gee, plm, lrm, svyglm, svyglm.nb. Other models not listed here are passed to a generic predict-function and might work as well, or maybe with ggeffect(), which effectively does the same as ggpredict().

If full.data = FALSE, expand.grid() is called on all unique combinations of model.frame(model)[, terms] and used as newdata-argument for predict(). In this case, all remaining covariates that are not specified in terms are held constant. Numeric values are set to the mean (unless changed with the typical-argument), factors are set to their reference level and character vectors to the most common element.

ggaverage() computes the average predicted values, by calling ggpredict() with full.data = TRUE, where argument newdata = model.frame(model) is used in predict(). Hence, predictions are made on the model data. In this case, all remaining covariates that are not specified in terms are not held constant, but vary between observations (and are kept as they happen to be). The predicted values are then averaged for each group (if any).

Thus, ggpredict() can be considered as calculating marginal effects at the mean, while ggaverage() computes average marginal effects.

Examples

Run this code

# NOT RUN {
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)

ggpredict(fit, terms = "c12hour")
ggpredict(fit, terms = "c12hour", full.data = TRUE)
ggpredict(fit, terms = c("c12hour", "c172code"))
ggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))

# to plot ggeffects-objects, you can use the 'plot()'-function.
# the following examples show how to build your ggplot by hand.

# plot predicted values, remaining covariates held constant
library(ggplot2)
mydf <- ggpredict(fit, terms = "c12hour")
ggplot(mydf, aes(x, predicted)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)

# with "full.data = TRUE", remaining covariates vary between
# observations, so fitted values can be plotted
mydf <- ggpredict(fit, terms = "c12hour", full.data = TRUE)
ggplot(mydf, aes(x, predicted)) + geom_point()

# you can add a smoothing-geom to show the linear trend of fitted values
ggplot(mydf, aes(x, predicted)) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_point()

# three variables, so we can use facets and groups
mydf <- ggpredict(
  fit,
  terms = c("c12hour", "c161sex", "c172code"),
  full.data = TRUE
)
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~facet, ncol = 2)

# average marginal effects
mydf <- ggaverage(fit, terms = c("c12hour", "c172code"))
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
  stat_smooth(method = "lm", se = FALSE)

# select specific levels for grouping terms
mydf <- ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex"))
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~facet) +
  labs(
    y = get_y_title(mydf),
    x = get_x_title(mydf),
    colour = get_legend_title(mydf)
  )

# level indication also works for factors with non-numeric levels
# and in combination with numeric levels for other variables
library(sjmisc)
data(efc)
efc$c172code <- to_label(efc$c172code)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
ggpredict(fit, terms = c("c12hour",
  "c172code [low level of education, high level of education]",
  "c161sex [1]"))

# use categorical value on x-axis, use axis-labels, add error bars
dat <- ggpredict(fit, terms = c("c172code", "c161sex"))
ggplot(dat, aes(x, predicted, colour = group)) +
  geom_point(position = position_dodge(.1)) +
  geom_errorbar(
    aes(ymin = conf.low, ymax = conf.high),
    position = position_dodge(.1)
  ) +
  scale_x_continuous(breaks = 1:3, labels = get_x_labels(dat))

# 3-way-interaction with 2 continuous variables
data(efc)
# make categorical
efc$c161sex <- to_factor(efc$c161sex)
fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc)
# select only levels 30, 50 and 70 from continuous variable Barthel-Index
dat <- ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex"))
ggplot(dat, aes(x = x, y = predicted, colour = group)) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  facet_wrap(~facet) +
  labs(
    colour = get_legend_title(dat),
    x = get_x_title(dat),
    y = get_y_title(dat),
    title = get_title(dat)
  )

# or with ggeffects' plot-method
# }
# NOT RUN {
plot(dat, ci = F)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab