Learn R Programming

classmap (version 1.2.6)

predsplot: Make a predictions plot

Description

Plots the prediction terms of a regression, together with the total prediction. The input variables of the regression can be numerical, categorical, logical and character, and are visualized by histograms, densities, or bar plots. The regression model can be linear or generalized linear. The regression formula in lm() or glm() may contain transformations and interactions. The plot shows all cases in the data, but can also be made for a single case, either in-sample or out-of-sample, to explain its prediction.

Usage

predsplot(fit, maxnpreds = 8, sort.by.stdev = TRUE, displaytype = "histogram",
totalpred.type = "response", trunc.totalpred = TRUE, casetoshow = NULL,
staircase = FALSE, verbose = TRUE, maxchar.level = 5, nfact = 8, main = NULL,
cex.main = 1, xlab = NULL, ylab = NULL, vlinewidth = 0.25, hlinewidth = 0.25,
drawborder = TRUE, borderwidth = 1.0, densitycolors = NULL, predupcolor = "red",
preddowncolor = "blue", predpointsize = 3, draw.segments = TRUE,
predsegmentwidth = 0.8, profile = FALSE, bw = "nrd0", adjust = 1)

Value

A list with items

p

the predictions plot, which is a ggplot2 object.

totpred

vector with the total linear prediction of all cases for which it can be computed.

centercept

the centercept, which is the total linear prediction when all prediction terms have their average value.

predterms

matrix of cases by prediction terms.

predsummary

data frame with the standard deviation of each prediction term and the total linear prediction.

casetotpred

a number, the total linear prediction for casetoshow if one is given.

casepredterms

a vector with the values of the prediction terms for casetoshow.

casesummary

data frame which shows all prediction terms for casetoshow together with the centercept, total linear prediction, and for a glm fit also the total prediction in response units.

Arguments

fit

an output object of lm or glm.

maxnpreds

the maximal number of prediction terms to plot. When there are more prediction terms than this, those with smallest standard deviations are combined.

sort.by.stdev

if TRUE, sorts the prediction terms by decreasing standard deviation.

displaytype

when "histogram", the distributions of the numerical prediction terms and the total prediction are displayed as histograms. When "density", the default density estimate of R is plotted instead.

totalpred.type

when fit is a glm object, option "response" plots labels of the total prediction that are obtained by the inverse link function. Option "linear" plots labels of the total linear prediction. When fit is an lm object, argument totalpred.type is ignored.

trunc.totalpred

if TRUE, the default, the range of the total prediction is truncated so ymin and ymax are determined by the individual prediction terms. FALSE may make the prediction terms look small in the plot.

casetoshow

if not NULL, the particular case to be displayed. This can be a case number or row name of a case in the dataset, or a list or vector with input values of a new case.

staircase

if TRUE and casetoshow is not NULL, the prediction for the case is shown in staircase style.

verbose

if TRUE, some intermediate results are shown on the console.

maxchar.level

only the first maxchar.level characters of the categorical levels are displayed.

nfact

if a numeric input variable has at most nfact unique values, it will be displayed as if it were a factor. This may be useful since R considers a binary variable to be numeric.

main

main title of the plot.

cex.main

its size.

xlab

horizontal legend. Its size can be changed by setting the height and width of the plot.

ylab

vertical legend. Its size can be changed by setting the height and width of the plot.

vlinewidth

width of the vertical lines.

hlinewidth

width of the horizontal line.

drawborder

if TRUE, draws a box outside the entire plot.

borderwidth

width of the border. Defaults to 1.

densitycolors

a vector with 4 colors. The first is for numeric input variables pointing up, the second for numeric input variables pointing down, the third for prediction terms without orientation and the total prediction, and the fourth for factors. If NULL, the default colors are used.

predupcolor

the color of a positive prediction for casetoshow.

preddowncolor

the color of a negative prediction for casetoshow.

predpointsize

the size of the points displaying the predictions for casetoshow.

draw.segments

if TRUE, also plots a line segment from the center of each prediction term to the prediction term for casetoshow, in the same color as the prediction.

predsegmentwidth

the width of that line segment.

profile

when casetoshow is not NULL and staircase is FALSE, this plots the profile of the case with a feint grey line.

bw

the bandwidth of the density estimation, only used when displaytype = "density". This is the argument 'bw' of the function density.

adjust

multiplier of the bandwidth of the density estimation, only used when displaytype = "density". This is the argument 'adjust' of the function density.

Author

Rousseeuw, P.J.

References

Rousseeuw, P.J. (2025). Explainable Linear and Generalized Linear Models by the Predictions Plot https://arxiv.org/abs/2412.16980v2 (open access).

See Also

predscor

Examples

Run this code
data(data_titanic)
attach(data_titanic)
survival = y
Pclass = factor(Pclass, unique(Pclass))
Sex = factor(Sex, labels = c("F","M"))
fit <- glm(survival ~ Sex + Age + SibSp + Parch + Pclass, family=binomial)
predsplot(fit, main = "Titanic data", displaytype = "density")

# For more examples, we refer to the vignette:
if (FALSE) {
vignette("predsplot_examples")
}

Run the code above in your browser using DataLab