interact_plot()
plots regression lines at user-specified levels of a
moderator variable to explore interactions. The plotting is done with
ggplot2
rather than base graphics, which some similar functions use.
interact_plot(model, pred, modx, modxvals = NULL, mod2 = NULL,
mod2vals = NULL, centered = NULL, scale = FALSE, n.sd = 1,
plot.points = FALSE, interval = FALSE, int.type = c("confidence",
"prediction"), int.width = 0.95, outcome.scale = "response",
linearity.check = FALSE, set.offset = 1, x.label = NULL,
y.label = NULL, pred.labels = NULL, modx.labels = NULL,
mod2.labels = NULL, main.title = NULL, legend.main = NULL,
color.class = NULL, line.thickness = 1.1, vary.lty = TRUE,
jitter = 0.1, standardize = NULL)
The name of the predictor variable involved in the interaction.
The name of the moderator variable involved in the interaction.
For which values of the moderator should lines be plotted?
Default is NULL
. If NULL
, then the customary +/- 1 standard
deviation from the mean as well as the mean itself are used for continuous
moderators. If the moderator is a factor variable and modxvals
is
NULL
, each level of the factor is included. If
"plus-minus"
, plots lines when the moderator is at +/- 1 standard
deviation without the mean. You may also choose "terciles"
to split
the data into equally-sized groups and choose the point at the mean of
each of those groups.
Optional. The name of the second moderator variable involved in the interaction.
For which values of the second moderator should the plot be
facetted by? That is, there will be a separate plot for each level of this
moderator. Defaults are the same as modxvals
.
A vector of quoted variable names that are to be
mean-centered. If NULL
, all non-focal predictors are centered. If
not NULL
, only the user-specified predictors are centered. User can
also use "none" or "all" arguments. The response variable is not centered
unless specified directly.
Logical. Would you like to standardize the variables
that are centered? Default is FALSE
, but if TRUE
it will
standardize variables specified by the centered
argument. Note that
non-focal predictors are centered when centered = NULL
, its
default.
How many standard deviations should be used if scale
= TRUE
? Default is 1, but some prefer 2.
Logical. If TRUE
, plots the actual data points as a
scatterplot on top of the interaction lines. The color of the dots will be
based on their moderator value.
Logical. If TRUE
, plots confidence/prediction
intervals around the line using geom_ribbon
. Not
supported for merMod
models.
Type of interval to plot. Options are "confidence" or "prediction". Default is confidence interval.
How large should the interval be, relative to the standard error? The default, .95, corresponds to roughly 1.96 standard errors and a .05 alpha level for values outside the range. In other words, for a confidence interval, .95 is analogous to a 95% confidence interval.
For nonlinear models (i.e., GLMs), should the outcome
variable be plotted on the link scale (e.g., log odds for logit models) or
the original scale (e.g., predicted probabilities for logit models)? The
default is "response"
, which is the original scale. For the link
scale, which will show straight lines rather than curves, use
"link"
.
For two-way interactions only. If TRUE
, plots a
pane for each level of the moderator and superimposes a loess smoothed
line (in gray) over the plot. This enables you to see if the effect is
linear through the span of the moderator. See Hainmuller et al. (2016) in
the references for more details on the intuition behind this. It is
recommended that you also set plot.points = TRUE
and use
modxvals = "terciles"
with this option.
For models with an offset (e.g., Poisson models), sets a offset for the predicted values. All predicted values will have the same offset. By default, this is set to 1, which makes the predicted values a proportion. See details for more about offset support.
A character object specifying the desired x-axis label. If
NULL
, the variable name is used.
A character object specifying the desired x-axis label. If
NULL
, the variable name is used.
A character vector of 2 labels for the predictor if it is
a 2-level factor or a continuous variable with only 2 values. If
NULL
, the default, the factor labels are used.
A character vector of labels for each level of the
moderator values, provided in the same order as the modxvals
argument. If NULL
, the values themselves are used as labels unless
modxvals
is also NULL
. In that case, "+1 SD" and "-1 SD"
are used.
A character vector of labels for each level of the 2nd
moderator values, provided in the same order as the mod2vals
argument. If NULL
, the values themselves are used as labels unless
mod2vals
is also NULL
. In that case, "+1 SD" and "-1 SD"
are used.
A character object that will be used as an overall title
for the plot. If NULL
, no main title is used.
A character object that will be used as the title that
appears above the legend. If NULL
, the name of the moderating
variable is used.
Any palette argument accepted by
scale_colour_brewer
. Default is "Set2" for factor
moderators, "Blues" for +/- SD and user-specified modxvals
values.
How thick should the plotted lines be? Default is 1.1; ggplot's default is 1.
Should the resulting plot have different shapes for each
line in addition to colors? Defaults to TRUE
.
How much should plot.points
observed values be "jittered"
via ggplot2::position_jitter()
? When there are many points near each
other, jittering moves them a small amount to keep them from
totally overlapping. In some cases, though, it can add confusion since
it may make points appear to be outside the boundaries of observed
values or cause other visual issues. Default is 0.1, but set to 0 if
you want no jittering.
Deprecated. Equivalent to scale
. Please change your
scripts to use scale
instead as this argument will be removed in the
future.
The functions returns a ggplot
object, which can be treated
like a user-created plot and expanded upon as such.
This function provides a means for plotting conditional effects
for the purpose of exploring interactions in the context of regression.
You must have the
package ggplot2
installed to benefit from these plotting functions.
The function is designed for two and three-way interactions. For
additional terms, the
effects
package may be better suited to the task.
This function supports nonlinear and generalized linear models and by
default will plot them on
their original scale (outcome.scale = "response"
).
While mixed effects models from lme4
are supported, only the fixed
effects are plotted. lme4
does not provide confidence intervals,
so they are not supported with this function either.
Note: to use transformed predictors, e.g., log(variable)
,
put its name in quotes or backticks in the argument.
Details on how observed data are split in multi-pane plots:
If you set plot.points = TRUE
and request a multi-pane (facetted) plot
either with a second moderator or linearity.check = TRUE
, the observed
data are split into as many groups as there are panes and plotted
separately. If the moderator is a factor, then the way this happens will
be very intuitive since it's obvious which values go in which pane. The
rest of this section will address the case of continuous moderators.
My recommendation is that you use modxvals = "terciles"
or
mod2vals = "terciles"
when you want to plot observed data on multi-pane
plots. When you do, the data are split into three approximately
equal-sized groups with the lowest third, middle third, and highest third
of the data split accordingly. You can replicate this procedure using
Hmisc::cut2()
with g = 3
from the Hmisc
package. Sometimes, the
groups will not be equal in size because the number of observations is
not divisible by 3 and/or there are multiple observations with the same
value at one of the cut points.
Otherwise, a more ad hoc procedure is used to split the data. Quantiles
are found for each mod2vals
or modxvals
value. These are not the
quantiles used to split the data, however, since we want the plotted lines
to represent the slope at a typical value in the group. The next step,
then, is to take the mean of each pair of neighboring quantiles and use
these as the cut points.
For example, if the mod2vals
are at the 25th, 50th, and 75th percentiles
of the distribution of the moderator, the data will be split at the
37.5th and and 62.5th percentiles. When the variable is
normally distributed, this will correspond fairly closely to using
terciles.
Info about offsets:
Offsets are partially supported by this function with important limitations. First of all, only a single offset per model is supported. Second, it is best in general to specify offsets with the offset argument of the model fitting function rather than in the formula. If it is specified in the formula with a svyglm, this function will stop with an error message.
It is also advised not to do any transformations to the offset other than the common log transformation. If you apply a log transform, this function will deal with it sensibly. So if your offset is a logged count, the exposure you set will be the non-logged version, which is much easeir to wrap one's head around. For any other transformation you may apply, or if you apply no transformation at all, the exposures used will be the post-tranformation number (which is by default 1).
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40(3), 373-400. http://dx.doi.org/10.1207/s15327906mbr4003_5
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analyses for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Hainmueller, J., Mummolo, J., & Xu, Y. (2016). How much should we trust estimates from multiplicative interaction models? Simple tools to improve empirical practice. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2739221
plotSlopes
from rockchalk performs a
similar function, but
with R's base graphics---this function is meant, in part, to emulate
its features.
sim_slopes
performs a simple slopes analysis with a similar
argument syntax to this function.
Other interaction tools: cat_plot
,
johnson_neyman
,
probe_interaction
, sim_slopes
# NOT RUN {
# Using a fitted lm model
states <- as.data.frame(state.x77)
states$HSGrad <- states$`HS Grad`
fit <- lm(Income ~ HSGrad + Murder * Illiteracy,
data = states)
interact_plot(model = fit, pred = Murder,
modx = Illiteracy)
# Using interval feature
fit <- lm(accel ~ mag * dist, data = attenu)
interact_plot(fit, pred = mag, modx = dist, interval = TRUE,
int.type = "confidence", int.width = .8)
# Using second moderator
fit <- lm(Income ~ HSGrad * Murder * Illiteracy,
data = states)
interact_plot(model = fit, pred = Murder,
modx = Illiteracy, mod2 = HSGrad)
# With svyglm
library(survey)
data(api)
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw,
data = apistrat, fpc = ~fpc)
regmodel <- svyglm(api00 ~ ell * meals, design = dstrat)
interact_plot(regmodel, pred = ell, modx = meals)
# With lme4
# }
# NOT RUN {
library(lme4)
data(VerbAgg)
mv <- glmer(r2 ~ Anger * mode + (1 | item), data = VerbAgg,
family = binomial,
control = glmerControl("bobyqa"))
interact_plot(mv, pred = Anger, modx = mode)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab