vtree is a tool for drawing variable trees. Variable trees display information about nested subsets of a data frame, in which the subsetting is defined by the values of categorical variables.
vtree(z, vars, prune = list(), prunebelow = list(), keep = list(),
follow = list(), prunelone = NULL, pruneNA = FALSE,
labelnode = list(), labelvar = NULL, fillcolor = NULL,
fillnodes = TRUE, NAfillcolor = "white", rootfillcolor = "#EFF3FF",
palette = NULL, gradient = TRUE, revgradient = FALSE,
singlecolor = 2, colorvarlabels = TRUE, title = "",
sameline = FALSE, Venn = FALSE, check.is.na = FALSE, seq = FALSE,
text = list(), plain = FALSE, squeeze = 1, shownodelabels = TRUE,
showvarnames = TRUE, showlevels = TRUE, showpct = TRUE,
showlpct = TRUE, showcount = TRUE, showlegend = FALSE,
varnamepointsize = 20, HTMLtext = FALSE, digits = 0,
splitwidth = 20, lsplitwidth = 15, getscript = FALSE,
nodesep = 0.5, ranksep = 0.5, margin = 0.2, vp = TRUE,
horiz = TRUE, summary = "", width = NULL, height = NULL,
graphattr = "", nodeattr = "", edgeattr = "", color = c("blue",
"forestgreen", "red", "orange", "pink"), colornodes = FALSE,
showempty = FALSE, rounded = TRUE, nodefunc = NULL,
nodeargs = NULL, parent = 1, last = 1, root = TRUE)
Required: Data frame, or a single vector.
Required (unless z
is a vector):
Either a character string of whitespace-separated variable names
or a vector of variable names.
List of vectors that identifies nodes to prune.
The name of each element of the
list must be one of the variable names in vars
.
Each element is a vector of character strings that
identifies the values of the variable (i.e. the nodes) to prune.
Like prune
but the nodes themselves are not pruned,
just their descendants.
Like prune
but specifies which nodes should be kept (i.e. not pruned).
Like prune
but specifies which nodes should be "followed".
For the variables named,
only the descendants of nodes that are followed will be shown.
A vector of values specifying lone nodes (of any variable) to prune. A lone node is a node that has no siblings.
Prune all missing values? This should be used carefully because "valid" percentages are hard to interpret when NAs are pruned.
List of vectors used to change how values of variables are displayed.
The name of each element of the
list is one of the variable names in vars
.
Each element of the list is a vector of character strings,
representing the values of the variable.
The names of the vector represent the labels to be used in place of the values.
A named vector of labels for variables.
A named vector of colors for filling the nodes of each variable. If an unnamed, scalar color is specified, all nodes will have this color.
Should the nodes be filled with color?
Fill color for missing value nodes. If NULL, fill colors of missing value nodes will be consistent with the fill colors in the rest of the tree.
Fill color for the root node.
A vector of palette numbers (which can range between 1 and 9). The names of the vector indicate the corresponding variable. See Palettes below for more information.
Should gradients of fill color be used across the values of each variable? A single value (with no names) specifies the setting for all variables. A logical vector of TRUE values for named variables is interpreted as TRUE for those variables and FALSE for all others. A logical vector of FALSE values for named variables is interpreted as FALSE for those variables and TRUE for all others.
Should the gradient be reversed (i.e. dark to light instead of light to dark)? A single value (with no names) specifies the setting for all variables. A logical vector of TRUE values for named variables is interpreted as A logical vector of FALSE values for named variables is interpreted as FALSE for those variables and TRUE for all others.
When a variable has a single value, should its nodes can be colored light (1) medium (2) or dark (3)?
Should the variable labels be colored?
Optional title for the root node of the tree.
Should node labels be on the same line as node percentages?
Display multi-way set membership information?
This provides an alternative to a Venn diagram.
This sets showpct=FALSE
, shownodelabels=FALSE
.
Assumption: all of the specified variables are logicals or 0/1 numeric variables.
Replace each variable named in vars
with a logical vector indicating
whether or not each of its values is missing?
Display the variable tree using "sequences"? Each unique sequence (i.e. pattern) of values will be shown separately.
A list of vectors containing extra text to add to specified nodes.
The name of each element of the list
must be one of the variable names in vars
.
Each element is a vector of character strings.
The names of the vector identify the nodes to which the text should be added.
(See Formatting codes below for information
on how to format text.)
Use "plain" color settings?
How much should the tree be "squeezed"?
A value between 0 and 1.
This controls two Graphviz parameters: margin
and nodesep
.
Should node labels be shown? A single value (with no names) specifies the setting for all variables. A logical vector of TRUE values for named variables is interpreted as TRUE for those variables and FALSE for all others. A logical vector of FALSE values for named variables is interpreted as FALSE for those variables and TRUE for all others.
Show the name of the variable next to each level of the tree?
(Deprecated) Same as showvarnames.
Show percentage in each node? A single value (with no names) specifies the setting for all variables. A logical vector of TRUE for named variables is interpreted as A logical vector of FALSE for named variables is interpreted as FALSE for those variables and TRUE for all others.
Show the (marginal) percentages for values of each variable in the legend?
Show count in each node? A single value (with no names) specifies the setting for all variables. A logical vector of TRUE for named variables is interpreted as A logical vector of FALSE for named variables is interpreted as FALSE for those variables and TRUE for all others.
Show legend (including marginal frequencies) for each variable?
Font size (in points) to use when displaying variable names.
Is the text formatted in HTML?
Number of decimal digits to show in percentages.
The minimum number of characters before an automatic linebreak is inserted.
The minimum number of characters before an automatic linebreak is inserted for legends.
Instead of displaying the variable tree, return the DOT script as a character string?
Graphviz attribute: Node separation amount.
Graphviz attribute: Rank separation amount.
Graphviz attribute: node margin.
Use "valid percentages"?
Valid percentages are computed by first excluding any missing values,
i.e. restricting attention to the set of "valid" observations.
The denominator is thus the number of non-missing observations.
When vp=TRUE
, nodes for missing values show the number of missing values
but do not show a percentage;
all the other nodes how valid percentages.
When vp=FALSE
, all nodes (including nodes for missing values)
show percentages of the total number of observations.
Should the tree be drawn horizontally? (i.e. parent node on the left, with the tree growing to the right)
A character string used to specify summary statistics to display in the nodes. The first word in the character string is the name of the variable to be summarized. The rest of the character string is the text that will be displayed, along with special codes specifying the information to display (see Summary codes below). A vector of character strings can also be specified, so that more than one variable may be summarized.
width (in pixels) to be passed to grViz
.
height (in pixels) to be passed to grViz
.
Character string: Additional attributes for the Graphviz graph.
Character string: Additional attributes for Graphviz nodes.
Character string: Additional attributes for Graphviz edges.
A vector of color names for the outline of the nodes at each level.
Should the node outlines be colored?
Show nodes that do not contain any observations?
Should the nodes have rounded boxes?
A node function (see Node functions below).
A list containing named arguments for the node function
specified by nodefunc
.
Parent node number (Internal use only.)
Last node number (Internal use only.)
Is this the root node of the tree? (Internal use only.)
If getscript=TRUE
, returns a character string of DOT script that describes the variable tree.
If getscript=FALSE
, returns an object of class htmlwidget
that will intelligently print itself into HTML in a variety of contexts
including the R console, within R Markdown documents, and within Shiny output bindings.
%mean%
mean
%SD%
standard deviation
%min%
minimum
%max%
maximum
%pX%
Xth percentile, e.g. p50 means the 50th percentile
%median%
median, i.e. p50
%IQR%
interquartile range, i.e. p25, p75
%list%
list of the individual values
%mv%
the number of missing values
%v%
the name of the variable
%noroot%
flag: Do not show summary in the root node.
%leafonly%
flag: Only show summary in leaf nodes.
%var=
n%
flag: Only show summary in nodes of the specified variable.
%trunc=
n%
flag: Truncate the summary to the first n characters.
Node functions provide a mechanism for running a function within each subset
representing a node of the tree. The summary
parameter uses node functions.
A node functions is a function takes as arguments a data frame subset,
the name of the subsetting variable, the value of the subsetting variable, and
a list of named arguments.
Formatting codes for the text
argument.
Also used by labelnode
and labelvar
.
\n
line break
*...*
italics
**...**
bold
^...^
superscript (using 10 point font)
~...~
subscript (using 10 point font)
%%red ...%%
display text in red (or whichever color is specified)
Sequential palettes from Color Brewer:
Reds
Blues
Greens
Oranges
Purples
YlGn
PuBu
PuRd
YlOrBr
# NOT RUN {
# A single-level hierarchy
vtree(FakeData,"Severity")
# A two-level hierarchy
vtree(FakeData,"Severity Sex")
# A two-level hierarchy with pruning of some values of Severity
vtree(FakeData,"Severity Sex",prune=list("Severity"=c("Moderate","NA")))
# Rename some nodes
vtree(FakeData,"Severity Sex",labelnode=list(Sex=(c("Male"="M","Female"="F"))))
# Rename a variable
vtree(FakeData,"Severity Sex",labelvar=c(Severity="How bad?"))
# Show legend. Put labels on the same line as counts and percentages
vtree(FakeData,"Severity Sex Viral",sameline=TRUE,showlegend=TRUE)
# Using the summary parameter to list ID numbers (truncated to 40 characters) in specified nodes
vtree(FakeData,"Severity Sex",summary="id \nid = %list% %var=Severity% %trunc=40%")
# }
Run the code above in your browser using DataLab