rpart
model.rpart
model.This function combines and extends plot.rpart
and
text.rpart
in the rpart
package.
For an overview, please see the package vignette
Plotting rpart trees with the rpart.plot package.
The function rpart.plot
provides a simplified interface
to this function.
First-time users should use rpart.plot
rather than
this function.
The arguments of this function are a superset of those of
rpart.plot
and some of the arguments have different
defaults.
See the rpart.plot
help page for a table showing the
different defaults.
(The defaults are different for historical reasons: for backwards
compatibility the defaults of prp
haven't changed,
whereas the defaults of rpart.plot
were changed when
type="auto"
and box.palette
were introduced
in version 2.0.0 of this package.)
prp(x=stop("no 'x' arg"), type=0, extra=0, under=FALSE, clip.right.labs=TRUE, nn=FALSE, ni=FALSE, yesno=TRUE, fallen.leaves=FALSE, branch=if(fallen.leaves) 1 else .2, uniform=TRUE, left=TRUE, xflip=FALSE, yflip=FALSE, Margin=0, space=1, gap=NULL, digits=2, varlen=-8, faclen=3, cex=NULL, tweak=1, compress=TRUE, ycompress=uniform, trace=FALSE, snip=FALSE, snip.fun=NULL,
box.col=0, box.palette=0, border.col=col, round=NULL, leaf.round=NULL, shadow.col=0, prefix="", suffix="", xsep=NULL,
under.font=font, under.col=1, under.cex=.8,
split.cex=1, split.font=2, split.family=family, split.col=1, split.box.col=0, split.border.col=0, split.lty=1, split.lwd=NULL, split.round=0, split.shadow.col=0, split.prefix="", right.split.prefix=NULL, split.suffix="", right.split.suffix=NULL, facsep=",", eq=" = ", lt=" < ", ge=" >= ",
branch.col=if(identical(branch.type, 0)) 1 else "gray", branch.lty=1, branch.lwd=NULL, branch.type=0, branch.tweak=1, min.branch.width=.002, branch.fill=branch.col,
nn.cex=NULL, nn.font=3, nn.family="", nn.col=1, nn.box.col=0, nn.border.col=nn.col, nn.lty=1, nn.lwd=NULL, nn.round=.3, yes.text="yes", no.text="no",
node.fun=internal.node.labs, split.fun=internal.split.labs, FUN="text",
nspace=branch, minbranch=.3, do.par=TRUE, add.labs=TRUE, clip.left.labs=FALSE, fam.main="", yshift=0, yspace=space, shadow.offset=.4, split.adj=NULL, split.yshift=0, split.space=space, split.yspace=yspace, split.shadow.offset=shadow.offset, nn.adj=.5, nn.yshift=0, nn.space=.8, nn.yspace=.5, ygap=gap/2, under.ygap=.5, yesno.yshift=0, xcompact=TRUE, ycompact=uniform, xcompact.ratio=.8, min.inter.height=4, max.auto.cex=1, min.auto.cex=.15, ycompress.cex=.7, accept.cex=1.1, shift.amounts=c(1.5, 2), Fallen.yspace=.1, boxes.include.gap=FALSE, legend.x=NULL, legend.y=NULL, legend.cex=1, ...)
rpart
object. The only required argument.
0 Default. Draw a split label at each split and a node label at each leaf.
1 Label all nodes, not just leaves.
Similar to text.rpart
's all=TRUE
.
2 Like 1
but draw the split labels below the node labels.
Similar to the plots in the CART book.
3 Draw separate split labels for the left and right directions.
4 Like 3
but label all nodes, not just leaves.
Similar to text.rpart
's fancy=TRUE
.
See also clip.right.labs
.
"auto" (case insensitive)
Automatically select a value based on the model type, as follows:
extra=106
class model with a binary response
extra=104
class model with a response having more than two levels
extra=100
other models
0 Default. No extra information.
1 Display the number of observations that fall in the node
(per class for class
objects;
prefixed by the number of events for poisson
and exp
models).
Similar to text.rpart
's use.n=TRUE
.
2 Class models: display the classification rate at the node, expressed as the number of correct classifications and the number of observations in the node. Poisson and exp models: display the number of events.
3 Class models: misclassification rate at the node, expressed as the number of incorrect classifications and the number of observations in the node.
4 Class models: probability per class of observations in the node (conditioned on the node, sum across a node is 1).
5 Class models:
like 4
but do not display the fitted class.
6 Class models: the probability of the second class only. Useful for binary responses.
7 Class models:
like 6
but do not display the fitted class.
8 Class models: the probability of the fitted class.
9 Class models: the probabilities times the fraction of observations in the node (the probability relative to all observations, sum across all leaves is 1).
+100 Add 100
to any of the above to also display
the percentage of observations in the node.
For example extra=101
displays the number
and percentage of observations in the node.
Actually, it's a weighted percentage
using the weights
passed to rpart
.
Note 1: Unlike text.rpart
,
by default prp
uses its own routine for
generating node labels (not the function attached to the object).
See the node.fun
argument.
Note 2: The extra
argument has special meaning
for mvpart
objects.
See the Appendix to this package's vignette.
extra > 0
.
Default FALSE
, meaning put the extra text in the box.
Use TRUE
to put the text under the box.
See also under.cex
.
TRUE
meaning ``clip'' the right-hand split labels,
i.e. do not print variable=
.
Applies only if type=3
or 4
.
See also clip.left.labs
.
FALSE
.
(In the current implementation
some overplotting may occur with nn=TRUE
.)
frame
.
Default FALSE
.
yes
and no
on the tree
1 (default) write yes
and no
at the top split
2 write yes
and no
at all splits. Seems to work best with
fallen.leaves=TRUE, split.border.col=1
.
(The yesno
argument is ignored if type=3
or 4
.
Use nn.col
and the other nn
parameters
to change the color etc. of the yes
and no
text.
Use yes.text
and no.text
to change the actual text displayed.)
FALSE
.
If TRUE
, position the leaf nodes at the bottom of the graph.
0
(V shaped branches) and
1
(square shouldered branches).
Default is if(fallen.leaves) 1 else .2
.
TRUE
(the default), the vertical spacing of the nodes is uniform.
If FALSE
, the nodes are spaced proportionally to the fit
(more precisely, to the difference between a node's deviance and the
sum of its two children's deviances).
Very small vertical spaces are automatically artificially expanded
to make room for the labels, see minbranch
.
Note: uniform=FALSE
with cex=NULL
(the default)
can sometimes cause very small text.
TRUE
, meaning the left side of a split is the path
taken if the split condition is true.
With left=FALSE
the split labels are changed so the right side is true.
FALSE
. If TRUE
, flip the tree horizontally.
FALSE
. If TRUE
, flip the tree vertically, so the root
is at the bottom.
0
, meaning no extra space.
To add say 10% space around the tree use Margin=0.1
.
(This is the margin
argument of plot.rpart
.
The name was changed to prevent partial matching with mar
,
which can be passed in as a ... argument.)
NULL
, meaning automatically choose a suitable value
(normally 1
, but if the graph is very crowded will be set to 0
,
permitting boxes to touch to allow a bigger cex
).
See also space
.
2
.
If 0
, use getOption("digits")
.
Details:
Numbers from 0.001
to 9999
are printed without an exponent
(and the number of digits is actually only a suggestion,
see format
for details).
Numbers out that range are printed with an ``engineering'' exponent (a multiple of 3).
-8
, meaning truncate to eight characters.
Possible values:0 use full names.
greater than 0 call abbreviate
with the given varlen
.
less than 0 truncate variable names to the shortest length where they are still unique,
but never truncate to shorter than abs(varlen)
.
3
, meaning abbreviate
to three characters.
Possible values are as varlen
above, except that
for back-compatibility with text.rpart
the special value 1
means represent the factor levels with alphabetic characters
(a
for the first level, b
for the second, etc.).
NULL
, meaning calculate the text size automatically.
cex
.
Default 1
, meaning no adjustment.
Use say tweak=1.2
to make the text 20% larger.
However, since font sizes are discrete the cex
you ask for
may not be exactly the cex
you get.
And a small change to tweak may not actually change the type size,
or change it more than you want.
TRUE
(the default), make more space
by shifting nodes horizontally where space is available.
This often allows larger text.
(This is the same as plot.rpart
's argument of the same name,
except that here the default is TRUE
.)
TRUE
(the default unless uniform=FALSE
),
make more space by shifting labels vertically where space is available.
Actually, this only kicks in
if the initial automatically calculated cex
is less than 0.7
.
Use ycompress=FALSE
if you feel the resulting display is too messy.
In the current implementation, the shifting algorithm
works a little better (allowing larger text)
with type=1
, 2
, or 3
.
FALSE
.
Use TRUE
to print the automatically calculated cex
, xlim
, and ylim
.
Use integer values greater than 1 for more detailed tracing.
FALSE
.
Set TRUE
to interactively trim the tree with the mouse.
See the package vignette (or just try it).
snip=TRUE
.
Default NULL
, meaning no function.
Otherwise set snip.fun
to your own function
with the prototype function(tree)
, where tree
is the snipped tree.
See the package vignette for an example.The following control the node labels.
0
, meaning use the background color.
This argument overrides the box.palette
argument.
colors
,
for example box.palette=c("Red", "Yellow", "Green").
Small fitted values are displayed with colors at the start of the vector;
large values with colors at the end.The special value box.palette=0 (default) uses the background color (typically white).
The special value box.palette="auto" (case insensitive) automatically selects a predefined palette based on the type of model.
Otherwise specify a predefined palette
e.g. box.palette="Grays" for the predefined gray palette (a range of grays).
The predefined palettes are:
Blues
Browns
Grays
Greys
Greens
Oranges
Reds
Purples
Bu
Bn
Gn
Gy
Or
Rd
Pu
(alternative names for the above palettes)
BuGn
GnRd
BuOr
etc. (two color palettes: any combination of two palettes)
RdYlGn
GnYlRd
(three color palettes)
Prefix the palette name with - to reverse the order of the colors
e.g. box.palette="-auto"
or box.palette="-Grays"
.
The box.palette
argument is ignored if the box.col
palette argument is specified.
col
, the color of the text in the box.
Use 0
for no border.
(Note: par
settings like col
can be passed in as ... arguments.
If not passed in, par("col")
is used.)
NULL
, meaning calculate automatically.
Else specify 0
for sharp edges, and values greater
than 0
for rounded edges.
Bigger is more round.
Values too big for the size of the box get silently reduced.
NULL
, meaning use round
.
Else specify a value greater than or equal to 0
.
0
, no shadow.
Try "gray"
or "darkgray"
.
(Note: overlapping shadows look better on devices that support alpha
channels.
If you get the message "Warning: semi-transparency is not supported" please let me know --
it means that a fix is needed to the code that determines if the device supports alpha channels.)
""
.
Prepend this string to the node labels.
So could be the name of the fitted response, for instance.
""
.
Append this string to the node labels.
Text after a double newline "\n\n"
(if any) will be plotted under the box.
(Actually, double newlines can be used in any of the prefix or suffix
arguments for this purpose.)
extra>0
.
Default NULL meaning automatically select: usually
" "
(two spaces), but " / "
for rates.
Use xsep="/"
for compatibility with text.rpart
.
See also facsep
, which separates the factor levels in split labels. The following control the text under the boxes (apply only if
under=TRUE
or there is a double newline \n\n
in prefix
or suffix
).
font
(which can be passed in as a ... argument).
1
.
.8
, smaller than the text in the box.The following control the split labels.
cex
(which by default is calculated automatically).
Default 1
.
2
, bold.
(Note: use font
to change the node label text.)
""
,
or use something like split.family="serif"
.
(Note: use family
to change the node label text.)
1
.
(Note: use col
to change the node label text.)
0
, meaning use the background color.
0
, invisible.
1
, but the border will be invisible
unless you change the default split.border.col
.
(Note: use lty
to change the node box borders.)
cex
(which by default is calculated automatically).
The border is by default invisible, see codesplit.border.col.
0
, meaning sharp corners.
Else specify a value greater than or equal to 0
.
The split boxes are by default invisible,
see split.box.col
and split.border.col
).
0
, no shadow.
""
.
Prepend this string to the split labels.
split.prefix
.
Prepend this string to the right split labels.
Applies only when type=3
or 4
.
""
.
Append this string to the split labels.
split.suffix
.
Append this string to the right split labels.
Applies only when type=3
or 4
.
","
.
String which separates the factor levels in split labels.
See also xsep
, which separates the individual counts when extra
is used.
" = "
.
String which separates the factor name from the levels in split labels.
The idea is that you can add or remove spaces around the =
, or use
words if that suits you.
" < "
.
String which represents ``less than'' in split labels.
" >= "
.
String which represents ``greater than or equal'' in split labels.The following control the branches.
1
, but set to "gray"
if branch.type
is nonzero.
1
.
cex
(which by default is calculated automatically).
(Note: branch.lwd
does not control the width of the
``wide branches'' drawn when branch.type
is nonzero.)
0
.
If nonzero draw ``wide branches'', with branch widths proportional
to the parameter selected by branch.type
as follows: 0 The default. The branch lines are drawn conventionally.
1 deviance
2 sqrt(deviance)
3 deviance / nobs
4 sqrt(deviance / nobs) (the standard deviation when method="anova"
)
5 weight (frame$wt
).
This is the number of observations at the node,
unless rpart
's weight
argument was used.
6 complexity
7 abs(predicted value)
8 predicted value - min(predicted value)
9 constant (for checking visual perception of the
relative width of branches).
Otherwise set branch.type
to your own function.
The function should take a single argument x
(the rpart
object) and return
a numeric vector of non-negative widths corresponding to rows in frame
.
See get.branch.widths
in the source code.
Note: with a nonzero branch.type
, in the current implementation
the branch
argument will be silently changed to 1
(if branch > .5
) or 0
(if branch < .5
)
1
.
Applies only if branch.type
is nonzero.
Use this argument to scale the widths of the branches,
for example, branch.tweak=.5
to halve the width of the branches.
(By default, prp
normalizes the widths so the
widest branch is one-fifth the plot width.)
0.002
.
Applies only if branch.type
is nonzero.
The minimum width of a branch, as a fraction of the page width.
The width of branches that would be thinner than min.branch.width
is clamped.
Increase min.branch.width
if the thinnest branches are too skinny on your display device.
branch.type
is nonzero.
Default branch.col
. The following control the node numbers (with nn=TRUE)
.
NULL
, meaning calculate the cex
of the node numbers automatically.
This and the following arguments apply only when nn=TRUE
.
3
, italic.
""
.
1
.
0
, meaning use the background color.
nn.col
.
1
.
cex
(which by default is calculated automatically).
Default NULL
, meaning use lwd
(which can be passed in as a ... argument).
.3
, meaning small corners.
Else specify a value greater than or equal to 0
.
yesno=TRUE
.
Default yes.text="yes"
and no.text="no"
.The following are user definable functions.
internal.node.labs
,
which specifies a function internal to prp
.
This is necessary for full support of extra
as described
in the section on extra
above.
Otherwise set node.fun
to your own function
with the prototype
function(x, labs, digits, varlen)
See the package vignette for details.
internal.split.labs
,
which specifies a function internal to prp
.
Otherwise set split.fun
to your own function with the
prototype function(x, labs, digits, varlen, faclen)
text
.The following are esoteric parameters, mostly for the graph layout engine.
compress=TRUE
.
Default nspace=branch
.
The size of the space between a split and a leaf, relative to
the space between leaves.
uniform=FALSE
. Default .3
.
The minimum height between levels is clamped at
minbranch
times the mean interlevel distance.
Needed because sometimes a split gives little or no improvement in deviance, and
an interlevel distance strictly proportional to the improvement would
leave no room for the label.
TRUE
, meaning adjust the mar
parameter
so the tree fills the figure region.
This also sets xpd=NA
.
These graphic parameters are restored to their original
state before prp
exits.
If you explicitly set mar
or xpd
, prp
will
use your setting regardless of the setting of do.par
.
TRUE
, meaning display the labels.
If FALSE
, gives a bare bones display similar to plot.rpart
.
clip.right.labs
but for the left labels.
Default is FALSE
.
Note that clip.left.labs
and clip.right.labs
can be vectors,
indexed on the split number.
""
.
The (inconsistent) name was chosen to minimize partial matching
with main
and family
which can be passed in as
in as ... arguments.
0
.
Negative values move the text down; positive up
(the box around the text will follow along).
1
.
Use this (and yspace
) for bigger boxes.
Since this affects the size of the (possibly invisible) boxes,
it also affects the graph layout and hence also the automatic calculation of cex
.
space
.
See the comments for space
.
.4
(but the shadow will be invisible unless the default shadow.col
is changed).
adj
arguments.
Default NULL
, meaning use adj
(which defaults to 0.5
but can be passed in as a ... argument).
Use values less/more than .5
to shift the text left/right
(the box around the text will follow along).
0
.
Negative values move the text down; positive up
(the box around the text will follow along).
This adjusts the positions of the split labels relative to the node labels.
(Use yshift
if you want to shift both the split and node labels.)
space
.
Affects the size of the box drawn around the text.
The split boxes are by default invisible
(see split.box.col
and split.border.col
),
but nevertheless affect the graph layout used
in the automatic calculation of cex
.
yspace
.
shadow.offset
.
(but the shadow will be invisible unless the default shadow.col
is changed).
.5
.
0
.
.8
.
.5
.
under=TRUE
or there is a double newline in prefix
or suffix
).
Vertical gap (in char heights) between the lower edge of the box and
the top of the text under the box.
0
.
Applies only when yesno=TRUE
.
gap/2
.
TRUE
(the default) and there is too much white space,
automatically change xlim
to compact the entire tree horizontally.
This usually only activates for small trees.
(The xcompact
and ycompact
arguments compact the tree as a whole,
whereas the compress
and ycompress
arguments move parts of the tree
into available space.)
TRUE
(the default) and there is too much vertical space,
automatically change ylim
to compact the entire tree vertically.
.8
.
Applies only when xcompact=TRUE
.
The maximum possible without overplotting is 1
,
but compacting by .8
usually
gives more pleasing spacing (it gives more space).
4
.
Applies only when ycompact=TRUE
.
Minimum height (in units of character height)
between the lowest label in a layer and the highest label in the layer below it.
cex
at this value
Default 1
, meaning never expand cex
, only contract when necessary.
.15
.
Never downscale to less than this when automatically calculating cex
,
even if overplotted labels result.
(The graph layout algorithm is unstable with cex
's below 0.15
-- meaning that the automatic type size may be smaller than necessary.)
.7
.
Applies only when ycompress=TRUE
.
Apply the ycompress
algorithm if the initial automatically calculated cex
is less than this.
The idea is that we don't want to shift if we get an acceptable cex without shifting.
Make Inf
to always attempt shifting.
cex
(because we don't want to shift if it gives only a small improvement in cex).
Default 1.1
i.e. require at least a 10% improvement.
Use 0
to always accept shifts and
Inf
to never accept (or use ycompress=FALSE
).
c(1.5, 2, 3)
.
For ycompress
, choose the best cex
yielded
by shifting nodes by these amounts,
in multiples of the box heights (after initial scaling).
.1
, meaning allow 10% of the vertical space for the fallen leaves.
(The name Fallen.yspace
uses upper case to avoid partial matching with fallen.leaves
.)
FALSE
.
Include gap
and ygap
when drawing the boxes, for debugging purposes.
(To draw the boxes, see
box.col
, border.col
, split.box.col
, and split.border.col
.)
This argument only affects the way the boxes are drawn, not the graph layout algorithm in any way.
With the optimum cex
at least one pair of boxes displayed in this manner will just touch
(but none will overlap).
0
and 1
, although values beyond
those limits are often useful.
Default is NULL
meaning automatically position the legend (assuming
there is enough space). Use NA
for no legend.
legend.x
but for the vertical position of the legend.
legend.x
but for the relative size of the legend text.
Default is 1
.
par
arguments. Only the ``important'' par
arguments are supported.
Note that arguments like col
apply only to the node labels.
To affect the split labels or branch lines, use split.col
and branch.col
instead.
rpart
object.
Identical to the x
argument passed in unless snip
was used.NULL
unless snip
was used.cex
.cex
.rpart.plot
Functions in the rpart
package:
plot.rpart
text.rpart
rpart
data(ptitanic)
tree <- rpart(survived ~ ., data=ptitanic, cp=.02)
# cp=.02 because want small tree for demo
old.par <- par(mfrow=c(2,2))
# put 4 figures on one page
prp(tree, main="default prp\n(type = 0, extra = 0)")
prp(tree, main="type = 4, extra = 6\nbox.palette = \"auto\"",
type=4, extra=6, # label all nodes, show prob of second class
box.palette="auto", # auto color the nodes based on the model type
faclen=0) # faclen=0 to print full factor names
cols <- ifelse(tree$frame$yval == 1, "darkred", "green4")
# green if survived
prp(tree, main="assorted arguments",
extra=106, # display prob of survival and percent of obs
nn=TRUE, # display the node numbers
fallen.leaves=TRUE, # put the leaves on the bottom of the page
shadow.col="gray", # shadows under the leaves
branch.lty=3, # draw branches using dotted lines
branch=.5, # change angle of branch lines
faclen=0, # faclen=0 to print full factor names
trace=1, # print the automatically calculated cex
split.cex=1.2, # make the split text larger than the node text
split.prefix="is ", # put "is " before split text
split.suffix="?", # put "?" after split text
col=cols, border.col=cols, # green if survived
split.box.col="lightgray", # lightgray split boxes (default is white)
split.border.col="darkgray", # darkgray border on split boxes
split.round=.5) # round the split box corners a tad
# compare to the plotting functions in the rpart package
plot(tree, uniform=TRUE, compress=TRUE, branch=.2)
text(tree, use.n=TRUE, cex=.8, xpd=NA) # cex is a guess, depends on your window size
title("compare to the plotting functions\nin the rpart package", cex.sub=.8)
par(old.par)
Run the code above in your browser using DataCamp Workspace