The companion function to make_predictions()
. This takes
data from make_predictions()
(or elsewhere) and plots them like
effect_plot()
, interact_plot()
, and cat_plot()
. Note that some
arguments will be ignored if the inputted predictions
plot_predictions(predictions, pred = NULL, modx = NULL, mod2 = NULL,
resp = NULL, data = NULL, geom = c("point", "line", "bar",
"boxplot"), plot.points = FALSE, interval = FALSE,
pred.values = NULL, modx.values = NULL, mod2.values = NULL,
linearity.check = FALSE, facet.modx = FALSE, x.label = NULL,
y.label = NULL, pred.labels = NULL, modx.labels = NULL,
mod2.labels = NULL, main.title = NULL, legend.main = NULL,
color.class = NULL, line.thickness = 1.1, vary.lty = NULL,
jitter = 0, weights = NULL, rug = FALSE, rug.sides = "b",
force.cat = FALSE, point.shape = FALSE, geom.alpha = NULL,
dodge.width = NULL, errorbar.width = NULL,
interval.geom = c("errorbar", "linerange"), pred.point.size = 3.5,
point.size = 1, ...)
Either the output from make_predictions()
(an object
of class "predictions") or a data frame of predicted values.
The name of the predictor variable involved in the interaction. This can be a bare name or string.
The name of the moderator variable involved in the interaction. This can be a bare name or string.
Optional. The name of the second moderator variable involved in the interaction. This can be a bare name or string.
What is the name of the response variable? Use a string.
Optional, default is NULL. You may provide the data used to
fit the model. This can be a better way to get mean values for centering
and can be crucial for models with variable transformations in the formula
(e.g., log(x)
) or polynomial terms (e.g., poly(x, 2)
). You will
see a warning if the function detects problems that would likely be
solved by providing the data with this argument and the function will
attempt to retrieve the original data from the global environment.
For factor predictors only: What type of plot should this be? There are several options here since the best way to visualize categorical interactions varies by context. Here are the options:
"point"
: The default. Simply plot the point estimates. You may want to
use
point.shape = TRUE
with this and you should also consider
interval = TRUE
to visualize uncertainty.
"line"
: This connects observations across levels of the pred
variable with a line. This is a good option when the pred
variable
is ordinal (ordered). You may still consider point.shape = TRUE
and
interval = TRUE
is still a good idea.
"bar"
: A bar chart. Some call this a "dynamite plot."
Many applied researchers advise against this type of plot because it
does not represent the distribution of the observed data or the
uncertainty of the predictions very well. It is best to at least use the
interval = TRUE
argument with this geom.
"boxplot"
: This geom plots a dot and whisker plot. These can be useful
for understanding the distribution of the observed data without having
to plot all the observed points (especially helpful with larger data
sets). However, it is important to note the boxplots are not based
on the model whatsoever.
Logical. If TRUE
, plots the actual data points as
a scatterplot on top of the interaction lines. The color of the dots will
be based on their moderator value.
Logical. If TRUE
, plots confidence/prediction
intervals around the line using geom_ribbon
.
Which values of the predictor should be included in the plot? By default, all levels are included.
For which values of the moderator should lines be plotted?
Default is NULL
. If NULL
, then the customary +/- 1 standard
deviation from the mean as well as the mean itself are used for continuous
moderators. If "plus-minus"
, plots lines when the moderator is at
+/- 1 standard deviation without the mean. You may also choose "terciles"
to split the data into equally-sized groups and choose the point at the
mean of each of those groups.
If the moderator is a factor variable and modx.values
is
NULL
, each level of the factor is included. You may specify
any subset of the factor levels (e.g., c("Level 1", "Level 3")
) as long
as there is more than 1. The levels will be plotted in the order you
provide them, so this can be used to reorder levels as well.
For which values of the second moderator should the plot
be
facetted by? That is, there will be a separate plot for each level of this
moderator. Defaults are the same as modx.values
.
For two-way interactions only. If TRUE
, plots a
pane for each level of the moderator and superimposes a loess smoothed
line (in gray) over the plot. This enables you to see if the effect is
linear through the span of the moderator. See Hainmueller et al. (2016) in
the references for more details on the intuition behind this. It is
recommended that you also set plot.points = TRUE
and use
modx.values = "terciles"
with this option.
Create separate panels for each level of the moderator?
Default is FALSE, except when linearity.check
is TRUE.
A character object specifying the desired x-axis label. If
NULL
, the variable name is used.
A character object specifying the desired x-axis label. If
NULL
, the variable name is used.
A character vector of 2 labels for the predictor if it is
a 2-level factor or a continuous variable with only 2 values. If
NULL
, the default, the factor labels are used.
A character vector of labels for each level of the
moderator values, provided in the same order as the modx.values
argument. If NULL
, the values themselves are used as labels unless
modx,values
is also NULL
. In that case, "+1 SD" and "-1 SD"
are used.
A character vector of labels for each level of the 2nd
moderator values, provided in the same order as the mod2.values
argument. If NULL
, the values themselves are used as labels unless
mod2.values
is also NULL
. In that case, "+1 SD" and "-1 SD"
are used.
A character object that will be used as an overall title
for the plot. If NULL
, no main title is used.
A character object that will be used as the title that
appears above the legend. If NULL
, the name of the moderating
variable is used.
See jtools_colors for details on the types of arguments
accepted. Default is "CUD Bright" for factor
moderators, "Blues" for +/- SD and user-specified modx.values
values.
How thick should the plotted lines be? Default is 1.1; ggplot's default is 1.
Should the resulting plot have different shapes for each
line in addition to colors? Default is NULL, which will switch to FALSE
if the pred
is a factor and TRUE if pred
is continuous.
How much should plot.points
observed values be "jittered"
via ggplot2::position_jitter()
? When there are many points near each
other, jittering moves them a small amount to keep them from
totally overlapping. In some cases, though, it can add confusion since
it may make points appear to be outside the boundaries of observed
values or cause other visual issues. Default is 0, but try various
small values (e.g., 0.1) and increase as needed if your points are
overlapping too much. If the argument is a vector with two values,
then the first is assumed to be the jitter for width and the second
for the height.
If the data are weighted, provide a vector of weights here.
This is only used if plot.points = TRUE
and data
is not NULL.
Show a rug plot in the margins? This uses ggplot2::geom_rug()
to show the distribution of the predictor (top/bottom) and/or
response variable (left/right) in the original data. Default is
FALSE.
On which sides should rug plots appear? Default is "b", meaning bottom. "t" and/or "b" show the distribution of the predictor while "l" and/or "r" show the distribution of the response. "bl" is a good option to show both the predictor and response.
Force the predictor to be treated as if it is a factor, even if it isn't? Default is FALSE. Set to TRUE if you'd like to generate a type of plot normally reserved for categorical variables. This can be helpful for numeric variables that have a small number of unique values, for instance.
For plotted points---either of observed data or predicted values with the "point" or "line" geoms---should the shape of the points vary by the values of the factor? This is especially useful if you aim to be black and white printing- or colorblind-friendly.
What should the alpha aesthetic be for the plotted
lines/bars? Default is NULL, which means it is set depending on the value
of geom
and plot.points
.
What should the width
argument to
ggplot2::position_dodge()
be? Default is NULL, which means it is set
depending on the value of geom
.
How wide should the error bars be? Default is NULL,
meaning it is set depending on the value geom
. Ignored if interval
is FALSE.
For categorical by categorical interactions.
One of "errorbar" or "linerange". If the former,
ggplot2::geom_errorbar()
is used. If the latter,
ggplot2::geom_linerange()
is used.
If TRUE and geom
is "point"
or "line"
,
sets the size of the predicted points. Default is 3.5.
Note the distinction from point.size
, which refers to the
observed data points.
What size should be used for observed data when
plot.points
is TRUE? Default is 2.
Ignored.
This is designed to offer more flexibility than the canned functions
(effect_plot()
, interact_plot()
, and cat_plot()
), by letting you
generate your own predicted data and iteratively experiment with the
plotting options.
Note: predictions
objects from make_predictions()
store information
about the arguments used to create the object. Unless you specify those
arguments manually to this function, as a convenience plot_predictions
will use the arguments stored in the predictions
object. Those arguments
are:
pred
, modx
, and mod2
resp
pred.values
, modx.values
, and mod2.values
pred.labels
, modx.labels
, and mod2.labels
data
interval
linearity.check
weights
Other plotting tools: make_predictions