Plots the prediction terms of a regression, together with the total prediction. The input variables of the regression can be numerical, categorical, logical and character, and are visualized by histograms, densities, or bar plots. The regression model can be linear or generalized linear. The regression formula in lm() or glm() may contain transformations and interactions. The plot shows all cases in the data, but can also be made for a single case, either in-sample or out-of-sample, to explain its prediction.
predsplot(fit, maxnpreds = 8, sort.by.stdev = TRUE, displaytype = "histogram",
totalpred.type = "response", trunc.totalpred = TRUE, casetoshow = NULL,
staircase = FALSE, verbose = TRUE, maxchar.level = 5, nfact = 8, main = NULL,
cex.main = 1, xlab = NULL, ylab = NULL, vlinewidth = 0.25, hlinewidth = 0.25,
drawborder = TRUE, borderwidth = 1.0, densitycolors = NULL, predupcolor = "red",
preddowncolor = "blue", predpointsize = 3, draw.segments = TRUE,
predsegmentwidth = 0.8, profile = FALSE, bw = "nrd0", adjust = 1)
A list with items
the predictions plot, which is a ggplot2 object.
vector with the total linear prediction of all cases for which it can be computed.
the centercept, which is the total linear prediction when all prediction terms have their average value.
matrix of cases by prediction terms.
data frame with the standard deviation of each prediction term and the total linear prediction.
a number, the total linear prediction for casetoshow if one is given.
a vector with the values of the prediction terms for casetoshow.
data frame which shows all prediction terms for casetoshow together with the centercept, total linear prediction, and for a glm fit also the total prediction in response units.
the maximal number of prediction terms to plot. When there are more prediction terms than this, those with smallest standard deviations are combined.
if TRUE
, sorts the prediction terms by
decreasing standard deviation.
when "histogram"
, the distributions
of the numerical prediction terms and
the total prediction are displayed as
histograms. When "density"
, the
default density estimate of R is
plotted instead.
when fit is a glm
object, option
"response"
plots labels of the total
prediction that are obtained by the
inverse link function. Option
"linear"
plots labels of the total
linear prediction. When fit is an
lm
object, argument totalpred.type
is ignored.
if TRUE
, the default, the range of
the total prediction is truncated
so ymin and ymax are determined by
the individual prediction terms. FALSE
may make the prediction terms look small
in the plot.
if not NULL
, the particular case
to be displayed. This can be a case
number or row name of a case in the
dataset, or a list or vector with
input values of a new case.
if TRUE
and casetoshow is not
NULL
, the prediction for the case
is shown in staircase style.
if TRUE
, some intermediate results are
shown on the console.
only the first maxchar.level characters of the categorical levels are displayed.
if a numeric input variable has at most
nfact
unique values, it will be
displayed as if it were a factor.
This may be useful since R considers
a binary variable to be numeric.
main title of the plot.
its size.
horizontal legend. Its size can be changed by setting the height and width of the plot.
vertical legend. Its size can be changed by setting the height and width of the plot.
width of the vertical lines.
width of the horizontal line.
if TRUE
, draws a box outside the entire
plot.
width of the border. Defaults to 1.
a vector with 4 colors. The first is
for numeric input variables pointing up,
the second for numeric input variables
pointing down, the third for
prediction terms without orientation
and the total prediction, and the
fourth for factors. If NULL
, the
default colors are used.
the color of a positive prediction for casetoshow.
the color of a negative prediction for casetoshow.
the size of the points displaying the predictions for casetoshow.
if TRUE
, also plots a line segment
from the center of each prediction term
to the prediction term for casetoshow,
in the same color as the prediction.
the width of that line segment.
when casetoshow is not NULL
and staircase
is FALSE
, this plots the profile of the
case with a feint grey line.
the bandwidth of the density estimation,
only used when displaytype = "density"
.
This is the argument 'bw'
of the function
density
.
multiplier of the bandwidth of the density
estimation, only used when displaytype =
"density"
. This is the argument 'adjust'
of the function density
.
Rousseeuw, P.J.
Rousseeuw, P.J. (2025). Explainable Linear and Generalized Linear Models by the Predictions Plot https://arxiv.org/abs/2412.16980v2 (open access).
predscor
data(data_titanic)
attach(data_titanic)
survival = y
Pclass = factor(Pclass, unique(Pclass))
Sex = factor(Sex, labels = c("F","M"))
fit <- glm(survival ~ Sex + Age + SibSp + Parch + Pclass, family=binomial)
predsplot(fit, main = "Titanic data", displaytype = "density")
# For more examples, we refer to the vignette:
if (FALSE) {
vignette("predsplot_examples")
}
Run the code above in your browser using DataLab