vtree is a tool for drawing variable trees. Variable trees display information about nested subsets of a data frame, in which the subsetting is defined by the values of categorical variables.
vtree(z, vars, showempty = FALSE, title = "", prune = list(),
prunebelow = list(), keep = list(), follow = list(),
labelnode = list(), labelvar = NULL, text = list(),
check.is.na = FALSE, nodefunc = NULL, nodeargs = NULL,
HTMLtext = FALSE, showlevels = TRUE, levelshape = "none",
digits = 0, splitwidth = 20, color = c("blue", "forestgreen",
"red", "orange", "pink"), colornodes = FALSE,
fillcolor = rep(c("#EFF3FF", "#C6DBEF", "#9ECAE1", "#6BAED6",
"#4292C6", "#2171B5", "#084594"), 8), fillnodes = TRUE,
getscript = FALSE, nodesep = 0.5, ranksep = 0.5, vp = TRUE,
root = 1, last = 1, top = TRUE, horiz = TRUE, rounded = TRUE,
summary = "", leafonly = FALSE, width = NULL, height = NULL)
Data frame, or a single vector.
Either a character string of space-separated variable names
or a vector variable names. (Does not need to be specified if z
is a vector.)
Show empty (N=0) nodes?
title for the top node of the tree. (If not specified "Sample" will be shown.)
List of vectors to identify nodes to prune.
The name of each element of the
list is one of the variable names in vars
.
Each element is a vector of character strings identifying values of the variable (nodes) to prune.
Same as prune but the nodes themselves are not pruned, just their descendants.
Which nodes should be kept (i.e. not pruned).
Which nodes should be "followed". Here, "followed" means that the descendants of a node (and their descendants, etc.) will be shown.
List of vectors used to change how variable values are displayed.
The name of each element of the
list is one of the variable names in vars
.
Each element of the list is a vector of character strings.
The character strings represent the existing levels of the variable.
The names of the vector represent the corresponding new levels of the variable.
a named vector of labels for labeling nodes. Each element is a header for nodes of the corresponding variable.
A list of vectors used to specify extra text to add to certain nodes. (See Details for text formatting.)
Replace each variable with a logical vector indicating whether or not each of its values is missing.
This is one case where vtree
can use continuous variables.
A node function.
A list containing arguments for the node function.
Is the text formatted in HTML?
Show the names of the variables at each level?
The shape of the boxes showing the levels of the tree
Number of decimal digits to show in percentages
The minimum number of characters before an automatic linebreak is created.
A vector of color names for the outline of the nodes at each level (see Graphviz `color` attribute).
Should the node boxes be colored?
A vector of color names for filling the nodes at each level (see Graphviz `fillcolor` attribute).
Should the nodes be filled with color?
Instead of displaying the flowchart, return the DOT script as a character string?
Graphviz attribute: Node separation amount
Graphviz attribute: Rank separation amount
Use "valid percentages"?
Valid percentages are computed by first excluding any missing values,
i.e. restricting attention to the set of "valid" observations.
The denominator is thus the number of non-missing observations.
When vp=TRUE
, nodes for missing values show the number of missing values
but do not show a percentage. The other nodes show valid percentages.
When vp=FALSE
, all nodes (including nodes for missing values)
show percentages of the total number of observations.
Is this the root node?
Is this the last node?
Is this the top?
Should the tree be drawn horizontally? (i.e. root node on the left, with the tree growing to the right)
Should the nodes have rounded boxes?
A character string used to specify summary statistics to display in each node. The first word in the character string is the name of the variable to be summarized. The rest of the character string is the the string that will be displayed, in which special codes are transformed into summary statistics, etc. See Details for the special codes.
Should the summary information only be shown in leaf nodes?
width to be passed to grViz
height to be passed to grViz
Special codes for the summary
argument:
%mean%
mean
%SD%
standard deviation
%min%
minimum
%max%
maximum
%pX%
Xth percentile, e.g. p50 means the 50th percentile
%median%
median, i.e. p50
%IQR%
interquartile range, i.e. p25, p75
%list%
list of the individual values
%mv%
the number of missing values
%v%
the name of the variable
%noroot%
flag: Do not show summary in the root node.
%leafonly%
flag: Only show summary in leaf nodes.
Formatting codes for the text
argument.
Also used by labelnode
and labelvar
.
\n
line break
*...*
italics
**...**
bold
^...^
superscript (using 10 point font)
~...~
subscript (using 10 point font)
%%red ...%%
display text in red (or whichever color is specified)
# NOT RUN {
# A single-level hierarchy
vtree(FakeData,"Severity")
# A two-level hierarchy
vtree(FakeData,"Severity Sex")
# A two-level hierarchy with pruning of some values of Severity
vtree(FakeData,"Severity Sex",prune=list("Severity"=c("Moderate","NA")))
# Rename some nodes
vtree(FakeData,"Severity Sex",labelnode=list(Sex=(c("Male"="M","Female"="F"))))
# }
Run the code above in your browser using DataLab