prp: Plot an `rpart` model. A superset of `rpart.plot`.

Description

Plot an rpart model. The arguments of this function are a superset of those of rpart.plot. See ../doc/prp.pdf for an overview.

Usage

prp(x=stop("no 'x' arg"),
    type=0, extra=0, under=FALSE, clip.right.labs=TRUE,
    nn=FALSE, ni=FALSE, yesno=TRUE,
    fallen.leaves=FALSE, branch=if(fallen.leaves) 1 else .2,
    uniform=TRUE, left=TRUE, xflip=FALSE, yflip=FALSE,
    Margin=0, space=1, gap=NULL,
    digits=2, varlen=-8, faclen=3,
    cex=NULL, tweak=1,
    compress=TRUE, ycompress=uniform,
    trace=FALSE, snip=FALSE, snip.fun=NULL,
    box.col=0, border.col=col,
    round=NULL, leaf.round=NULL,
    shadow.col=0, prefix="", suffix="", xsep=NULL,
    under.font=font, under.col=1, under.cex=.8,
    split.cex=1, split.font=2, split.family=family, split.col=1,
    split.box.col=0, split.border.col=0,
    split.lty=1, split.lwd=NULL, split.round=0,
    split.shadow.col=0,
    split.prefix="", right.split.prefix=NULL,
    split.suffix="", right.split.suffix=NULL,
    facsep=",", eq=" = ", lt=" < ", ge=" >= ",
    branch.col=if(identical(branch.type, 0)) 1 else "gray",
    branch.lty=1, branch.lwd=NULL,
    branch.type=0, branch.tweak=1,
    min.branch.width=.002, branch.fill=branch.col,
    nn.cex=NULL, nn.font=3, nn.family="", nn.col=1,
    nn.box.col=0, nn.border.col=nn.col,
    nn.lty=1, nn.lwd=NULL, nn.round=.3,
    node.fun=internal.node.labs,
    split.fun=internal.split.labs,
    FUN=text,
    nspace=branch, minbranch=.3, do.par=TRUE,
    add.labs=TRUE, clip.left.labs=FALSE, fam.main="",
    yshift=0, yspace=space, shadow.offset=.4,
    split.adj=NULL, split.yshift=0, split.space=space,
    split.yspace=yspace, split.shadow.offset=shadow.offset,
    nn.adj=.5, nn.yshift=0, nn.space=.8, nn.yspace=.5,
    ygap=gap/2, under.ygap=.5, yesno.yshift=0,
    xcompact=TRUE, ycompact=uniform, xcompact.ratio=.8, min.inter.height=4,
    max.auto.cex=1, min.auto.cex=.15, ycompress.cex=.7, accept.cex=1.1,
    shift.amounts=c(1.5, 2),
    Fallen.yspace=.1, boxes.include.gap=FALSE,
    ...)

Arguments

An rpart object. The only required argument.

type

Type of plot. Five possibilities:

0 The default. Draw a split label at each split and a node label at each leaf.

1 Label all nodes, not just leaves. Similar to text.rpart's

extra

Display extra information at the nodes. Possible values:

0 No extra information (the default).

1 Display the number of observations that fall in the node (per class for class objects;

under

Applies only if extra > 0. Default FALSE, meaning put the extra text in the box. Use TRUE to put the text under the box. See also under.cex.

clip.right.labs

Default is TRUE meaning ``clip'' the right split labels, i.e. do not print variable=. Applies only if type=3 or 4. See also clip.left.labs.

Display the node numbers. Default FALSE. (In the current implementation some overplotting may occur with nn=TRUE.)

Display the node indices, i.e. the row numbers of the nodes in the object's frame. Default FALSE.

yesno

Default TRUE, meaning write yes and no on the appropriate sides of the top split. Ignored if type=3 or 4. (Use nn.col and the other nn parameters

fallen.leaves

Default FALSE. If TRUE, display the leaves at the bottom of the graph.

branch

Controls the shape of the branch lines. Specify a value between 0 (V shaped branches) and 1 (square shouldered branches). Default is if(fallen.leaves) 1 else .2.

uniform

If TRUE (the default), the vertical spacing of the nodes is uniform. If FALSE, the nodes are spaced proportionally to the fit (more precisely, to the difference between a node's deviance and the sum of its two children's

left

Default TRUE, meaning the left side of a split is the path taken if the split condition is true. With left=FALSE the split labels are changed so the right side is true.

xflip

Default FALSE. If TRUE, flip the tree horizontally.

yflip

Default FALSE. If TRUE, flip the tree vertically, so the root is at the bottom.

Margin

Extra white space around the tree, as a fraction of the graph width. Default 0, meaning no extra space. To add say 10% space around the tree use Margin=0.1. (This is the margin argument of plot.r

gap

Minimum horizontal gap between the (possibly invisible) boxes, in character widths. Default NULL, meaning automatically choose a suitable value (normally 1, but if the graph is very crowded will be set to 0

digits

The number of significant digits in displayed numbers. Default 2. If 0, use getOption("digits"). Details: Numbers from 0.001 to 9999

varlen

Length of variable names in text at the splits (and, for class responses, the class in the node label). Default -8, meaning truncate to eight characters. Possible values: =0 use full names. >0 call

faclen

Length of factor level names in splits. Default 3, meaning abbreviate to three characters. Possible values are as varlen above, except that 1 is trea

cex

Default NULL, meaning calculate the text size automatically.

tweak

Adjust the (possibly automatically calculated) cex. Default 1, meaning no adjustment. Use say tweak=1.2 to make the text 20% larger. Note that font sizes are discrete, so the cex you ask f

compress

If TRUE (the default), make more space by shifting nodes horizontally where space is available. This often allows larger text. (This is the same as plot.rpart's argument of the same name, except that here the

ycompress

If TRUE (the default unless uniform=FALSE), make more space by shifting labels vertically where space is available. Actually, this only kicks in if the initial automatically calculated cex is less than

trace

Default FALSE. Use TRUE to print the automatically calculated cex, xlim, and ylim. Use integer values greater than 1 for more detailed tracing.

snip

Default FALSE. Set TRUE to interactively trim the tree with the mouse. See ../doc/prp.pdf (or just try it).

snip.fun

Function invoked after each mouse click when snip=TRUE. Default NULL, meaning no function. Otherwise set snip.fun to your own function with the prototype function(tree), where tree

box.col

Color of the boxes around the text.
     Default 0, meaning use the background color.

border.col

Color of the box border around the text.
     Default col, the color of the text in the box.
     Use 0 for no border.
     (Note: par settings like col can be passed in as ...arguments.
     If not pass

round

Controls the rounding of the corners of the node boxes.
     Default NULL, meaning calculate automatically.
     Else specify 0 for sharp edges, and values greater
     than 0 for rounded edges.
     Bigger is more r

leaf.round

Controls the rounding of the corners of the leaf node boxes.
     Default NULL, meaning use round.
     Else specify a value greater than or equal to 0.

shadow.col

Color of the shadow under the boxes.
     Default 0, no shadow.
     Try "gray" or "darkgray".
     (Note: overlapping shadows look better on devices that support alpha

prefix

Default "".
     Prepend this string to the node labels.
     So could be the name of the fitted response, for instance.

suffix

Default "".
     Append this string to the node labels.
     Text after a double newline "\n\n" (if any) will be plotted under the box.
     (Actually, double newlines can be used in any of the prefix or suffix
     argu

xsep

String which separates the individual counts and probabilities
     in node labels when extra>0.
     Default NULL meaning automatically select: usually
"" (two spaces), but "/ " for rates.
     Use xsep="/"

under.font

Font of the text under the box. Default font (which can be passed in as a ...argument).

under.col

Color of the text under the box.
    Default 1.

under.cex

Size of the text under the box relative to the text in the box.
    Default .8, smaller than the text in the box.
The following control the split labels.

split.cex

Size of the split text relative to cex
     (which by default is calculated automatically).
     Default 1.

split.font

Font for the split labels.
     Default 2, bold.
     (Note: use font to change the node label text.)

split.family

Font family for the split labels.
     Default "",
     or use something like split.family="serif".
     (Note: use family to change the node label text.)

split.col

Color of the split label text.
     Default 1.
     (Note: use col to change the node label text.)

split.box.col

Color of the split boxes.
     Default 0, meaning use the background color.

split.border.col

Color of the split box borders.
     Default 0, invisible.

split.lty

Line type for the split box borders.
     The default is 1, but the border will be invisible
     unless you change the default split.border.col.
     (Note: use lty to change the node box borders.)

split.lwd

Line width of the split box border relative to cex
     (which by default is calculated automatically).
     The border is by default invisible, see code{split.border.col}.

split.round

Controls the rounding of the corners of the split boxes.
     Default 0, meaning sharp corners.
     Else specify a value greater than or equal to 0.
     The split boxes are by default invisible,
     see split.box.col

split.shadow.col

Color of the shadow under the split boxes.
     Default 0, no shadow.

split.prefix

Default "".
     Prepend this string to the split labels.

right.split.prefix

Default split.prefix.
     Prepend this string to the right split labels.
     Applies only when type=3 or 4.

split.suffix

Default "".
     Append this string to the split labels.

right.split.suffix

Default split.suffix.
     Append this string to the right split labels.
     Applies only when type=3 or 4.

facsep

Default ",".
     String which separates the factor levels in split labels.
     See also xsep, which separates the individual counts when extra is used.

eq

Default "= ".
     String which separates the factor name from the levels in split labels.
     The idea is that you can add or remove spaces around the =, or use
     words if that suits you.

lt

Default "< ".
     String which represents ``less than'' in split labels.

ge

Default ">= ".
     String which represents ``greater than or equal'' in split labels.
The following control the branches.

branch.col

Color of the branch lines.
     Default 1, but set to "gray" if branch.type is nonzero.

branch.lty

Branch line type.
     Default 1.

branch.lwd

Line width of the branch lines relative to cex
     (which by default is calculated automatically).
     (Note: branch.lwd does not control the width of the
     ``wide branches'' drawn when branch.type is nonzero.)

branch.type

Default 0.
     If nonzero draw ``wide branches'', with branch widths proportional
     to the parameter selected by branch.type as follows:
     0 The default.  The branch lines are drawn conventionally.
1

branch.tweak

Default 1.
     Applies only if branch.type is nonzero.
     Use this argument to scale the widths of the branches,
     for example, branch.tweak=.5 to halve the width of the branches.
     (By default, prp

min.branch.width

Default 0.002.
    Applies only if branch.type is nonzero.
    The minimum width of a branch, as a fraction of the page width.
    The width of branches that would be thinner than min.branch.width is clamped.
    Inc

branch.fill

Color used to fill the wide branch lines.
     Applies only if branch.type is nonzero.
         Default branch.col.
The following control the node numbers (with nn=TRUE).

nn.cex

Default NULL, meaning calculate the cex
     of the node numbers automatically.
     This and the following arguments apply only when nn=TRUE.

nn.font

Font for the node numbers.
     Default 3, italic.

nn.family

Font family for the node numbers.
     Default "".

nn.col

Color of the node number text.
     Default 1.

nn.box.col

Color of the boxes around the node numbers.
     Default 0, meaning use the background color.

nn.border.col

Color of the box border around the node numbers.
     Default nn.col.

nn.lty

Line type of the node number box border.
     Default 1.

nn.lwd

Line width of the node box border relative to cex
     (which by default is calculated automatically).
     Default NULL, meaning use lwd (which can be passed in as a ...argument).

nn.round

Controls the rounding of the corners of the node number boxes.
     Default .3, meaning small corners.
     Else specify a value greater than or equal to 0.
The following are user definable functions.

node.fun

The function that generates the text at the node labels.
     The default is internal.node.labs,
     which specifies a function internal to prp.
     This is necessary for full support of extra as described
     in

split.fun

The function that generates the text at the splits.
     Default internal.split.labs,
     which specifies a function internal to prp.
     Otherwise set split.fun to your own function with the
     prototype f

FUN

The function that displays the text on the screen.
     Default text.
The following are esoteric parameters, mostly for the graph layout engine.

nspace

Applies only when compress=TRUE.
     Default nspace=branch.
     The size of the space between a split and a leaf, relative to
     the space between leaves.

minbranch

Applies only when uniform=FALSE.  Default .3.
      The minimum height between levels is clamped at
      minbranch times the mean interlevel distance.
      Needed because sometimes a split gives little or no improv

do.par

Default TRUE, meaning adjust the mar parameter
     so the tree fills the figure region.
     This also sets xpd=NA.
     These graphic parameters are restored to their original
     state before prp exi

add.labs

Default TRUE, meaning display the labels.
     If FALSE, gives a bare bones display similar to plot.rpart.

clip.left.labs

Like clip.right.labs but for the left labels.
     Default is FALSE.
     Note that clip.left.labs and clip.right.labs can be vectors,
     indexed on the split number.

fam.main

Font family for the main text.  Default "".
     The (inconsistent) name was chosen to minimize partial matching
     with main and family which can be passed in as a
     which can be passed in as ...arguments.

yshift

Vertical position of the labels, in character heights relative to their default position.
     Default 0.
     Negative values move the text down; positive up
     (the box around the text will follow along).

space

Horizontal space to the box border on each side of the node label text,
     in character widths.
     Default 1.
     Use this (and yspace) for bigger boxes.
     Since this affects the size of the (possibly invisible) boxes,

yspace

Vertical space to the box border above and below the node label text, in character heights.
     Default space.
     See the comments for space.

shadow.offset

Offset of the shadow from the boxes, in character widths.
     Default .4
     (but the shadow will be invisible unless the default shadow.col is changed).

split.adj

Horizontal position of the split text.
     In string width units, as is the convention for adj arguments.
     Default NULL, meaning use adj
     (which defaults to 0.5 but can be passed in as a ...argu

split.yshift

Vertical position of the split labels,
     in character heights relative to their default positions.
     Default 0.
     Negative values move the text down; positive up
     (the box around the text will follow along).
     This adjusts the

split.space

Horizontal space between the split label text and the box,
     in character widths.
     Default space.
     Affects the size of the box drawn around the text.
     The split boxes are by default invisible
     (see split.box.col

split.yspace

Vertical space between the split label text and the box, in character heights.
     Default yspace.

split.shadow.offset

Offset of the shadow from the split boxes, in character widths.
     Default shadow.offset.
     (but the shadow will be invisible unless the default shadow.col is changed).

nn.adj

Horizontal position of the node label text.
     Default .5.

nn.yshift

Vertical position of the node numbers, in character heights relative to their default positions.
     Default 0.

nn.space

Horizontal space to the box border on each side of the node number text,
     in character widths.
     Default .8.

nn.yspace

Vertical space to the box border above and below the node number text, in character heights.
     Default .5.

under.ygap

Applies if text is plotted under the box
    (i.e. if under=TRUE or there is a double newline in prefix or suffix).
    Vertical gap (in char heights) between the lower edge of the box and
    the top of the text und

yesno.yshift

Vertical position of "yes" and "no"
     in character heights relative to their default position.
     Default 0.
     Applies only when yesno=TRUE.

ygap

Minimum vertical gap between boxes, in character heights.
     Default gap/2.

xcompact

If TRUE (the default) and there is too much white space,
     automatically change xlim to compact the entire tree horizontally.
     This usually only activates for small trees.
     (The xcompact and ycompact

ycompact

If TRUE (the default) and there is too much vertical space,
     automatically change ylim to compact the entire tree vertically.

xcompact.ratio

Default .8.
    Applies only when xcompact=TRUE.
    The maximum possible without overplotting is 1,
    but compacting by .8 usually
    gives more pleasing spacing (it gives more space).

min.inter.height

Default 4.
    Applies only when ycompact=TRUE.
    Minimum height (in units of character height)
    between the lowest label in a layer and the highest label in the layer below it.

max.auto.cex

Clamp the maximum automatically calculated cex at this value
    Default 1, meaning never expand cex, only contract when necessary.

min.auto.cex

Default .15.
    Never downscale to less than this when automatically calculating cex,
    even if overplotted labels result.
    (The graph layout algorithm is unstable with cex's below 0.15
    --- mea

ycompress.cex

Default .7.
    Applies only when ycompress=TRUE.
    Apply the ycompress algorithm if the initial automatically calculated cex is less than this.
    The idea is that we don't want to shift if we get an

accept.cex

Accept shifting only if it causes at least this much improvement in cex
    (because we don't want to shift if it gives only a small improvement in cex).
    Default 1.1 i.e. require at least a 10% improvement.
    Use 0

shift.amounts

Default c(1.5, 2, 3).
    For ycompress, choose the best cex yielded
    by shifting nodes by these amounts,
    in multiples of the box heights (after initial scaling).

Fallen.yspace

Extra space for fallen leaves.
    Default .1, meaning allow 10% of the vertical space for the fallen leaves.
    (The name Fallen.yspace uses upper case to avoid partial matching with fallen.leaves.)

boxes.include.gap

Default FALSE.
    Draw the boxes to include gap and ygap, for debugging.
    This only affects the way the boxes are drawn, not the graph layout algorithm in any way.
    With the optimum cex at least o

...

Extra par arguments.  Only the ``important'' par arguments are supported.
    Note that arguments like col apply only to the node labels.
    To affect the split labels or bra

`Value`

A list with the following components.
With the default args most of these are automatically calculated.
objThe rpart object.
    Identical to the x argument passed in unless snip was used.
snipped.nodesThe snipped nodes, NULL unless snip was used.
xlim, ylimThe graph limits.
x, yThe node coords.
branch.x, branch.yThe branch line coords.
labsThe node labels.
cexThe node label cex.
boxesThe coords of the boxes around the nodes.
split.labsThe split labels.
split.cexThe split label cex.
split.boxesThe coords of the boxes around the splits.

`See Also`

../doc/prp.pdf
rpart.plot
plot.rpart
text.rpart
rpart

`Examples`

Run this codedata(ptitanic)
tree <- rpart(survived ~ ., data=ptitanic, cp=.02)
                         # cp=.02 because want small tree for demo

old.par <- par(mfrow=c(2,2))
                         # put 4 figures on one page

prp(tree, main="default prp
(type = 0, extra = 0)")

prp(tree, main="type = 4,  extra = 6", type=4, extra=6, faclen=0)
                         # faclen=0 to print full factor names

cols <- ifelse(tree$frame$yval == 1, "darkred", "green4")
                         # green if survived

prp(tree, main="assorted arguments",
    extra=106,           # display prob of survival and percent of obs
    nn=TRUE,             # display the node numbers
    fallen.leaves=TRUE,  # put the leaves on the bottom of the page
    branch=.5,           # change angle of branch lines
    faclen=0,            # do not abbreviate factor levels
    trace=1,             # print the automatically calculated cex
    shadow.col="gray",   # shadows under the leaves
    branch.lty=3,        # draw branches using dotted lines
    split.cex=1.2,       # make the split text larger than the node text
    split.prefix="is ",  # put "is " before split text
    split.suffix="?",    # put "?" after split text
    col=cols, border.col=cols,   # green if survived
    split.box.col="lightgray",   # lightgray split boxes (default is white)
    split.border.col="darkgray", # darkgray border on split boxes
    split.round=.5)              # round the split box corners a tad

# the old way for comparison
plot(tree, uniform=TRUE, compress=TRUE, branch=.2)
text(tree, use.n=TRUE, cex=.6, xpd=NA) # cex is a guess, depends on your window size
title("plot.rpart for comparison", cex=.6)

par(old.par)
Run the code above in your browser using DataLab