vtree: vtree: Draw a variable tree

Description

vtree is a tool for drawing variable trees. Variable trees display information about nested subsets of a data frame, in which the subsetting is defined by the values of categorical variables.

Usage

vtree(z, vars, showempty = FALSE, title = "", prune = list(),
  prunebelow = list(), keep = list(), follow = list(),
  labelnode = list(), labelvar = NULL, text = list(),
  check.is.na = FALSE, nodefunc = NULL, nodeargs = NULL,
  HTMLtext = FALSE, showlevels = TRUE, levelshape = "none",
  digits = 0, splitwidth = 20, color = c("blue", "forestgreen",
  "red", "orange", "pink"), colornodes = FALSE,
  fillcolor = rep(c("#EFF3FF", "#C6DBEF", "#9ECAE1", "#6BAED6",
  "#4292C6", "#2171B5", "#084594"), 8), fillnodes = TRUE,
  getscript = FALSE, nodesep = 0.5, ranksep = 0.5, vp = TRUE,
  root = 1, last = 1, top = TRUE, horiz = TRUE, rounded = TRUE,
  summary = "", leafonly = FALSE, width = NULL, height = NULL)

Arguments

Data frame, or a single vector.

vars

Either a character string of space-separated variable names or a vector variable names. (Does not need to be specified if z is a vector.)

showempty

Show empty (N=0) nodes?

title

title for the top node of the tree. (If not specified "Sample" will be shown.)

prune

List of vectors to identify nodes to prune. The name of each element of the list is one of the variable names in vars. Each element is a vector of character strings identifying values of the variable (nodes) to prune.

prunebelow

Same as prune but the nodes themselves are not pruned, just their descendants.

keep

Which nodes should be kept (i.e. not pruned).

Which nodes should be "followed". Here, "followed" means that the descendants of a node (and their descendants, etc.) will be shown.

labelnode

List of vectors used to change how variable values are displayed. The name of each element of the list is one of the variable names in vars. Each element of the list is a vector of character strings. The character strings represent the existing levels of the variable. The names of the vector represent the corresponding new levels of the variable.

labelvar

a named vector of labels for labeling nodes. Each element is a header for nodes of the corresponding variable.

text

A list of vectors used to specify extra text to add to certain nodes. (See Details for text formatting.)

check.is.na

Replace each variable with a logical vector indicating whether or not each of its values is missing. This is one case where vtree can use continuous variables.

nodefunc

A node function.

nodeargs

A list containing arguments for the node function.

HTMLtext

Is the text formatted in HTML?

showlevels

Show the names of the variables at each level?

levelshape

The shape of the boxes showing the levels of the tree

digits

Number of decimal digits to show in percentages

splitwidth

The minimum number of characters before an automatic linebreak is created.

color

A vector of color names for the outline of the nodes at each level (see Graphviz `color` attribute).

colornodes

Should the node boxes be colored?

fillcolor

A vector of color names for filling the nodes at each level (see Graphviz `fillcolor` attribute).

fillnodes

Should the nodes be filled with color?

getscript

Instead of displaying the flowchart, return the DOT script as a character string?

nodesep

Graphviz attribute: Node separation amount

ranksep

Graphviz attribute: Rank separation amount

Use "valid percentages"? Valid percentages are computed by first excluding any missing values, i.e. restricting attention to the set of "valid" observations. The denominator is thus the number of non-missing observations. When vp=TRUE, nodes for missing values show the number of missing values but do not show a percentage. The other nodes show valid percentages. When vp=FALSE, all nodes (including nodes for missing values) show percentages of the total number of observations.

root

Is this the root node?

last

Is this the last node?

top

Is this the top?

horiz

Should the tree be drawn horizontally? (i.e. root node on the left, with the tree growing to the right)

rounded

Should the nodes have rounded boxes?

summary

A character string used to specify summary statistics to display in each node. The first word in the character string is the name of the variable to be summarized. The rest of the character string is the the string that will be displayed, in which special codes are transformed into summary statistics, etc. See Details for the special codes.

leafonly

Should the summary information only be shown in leaf nodes?

width

width to be passed to grViz

height

height to be passed to grViz

Details

Special codes for the summary argument:

%mean% mean
%SD% standard deviation
%min% minimum
%max% maximum
%pX% Xth percentile, e.g. p50 means the 50th percentile
%median% median, i.e. p50
%IQR% interquartile range, i.e. p25, p75
%list% list of the individual values
%mv% the number of missing values
%v% the name of the variable
%noroot% flag: Do not show summary in the root node.
%leafonly% flag: Only show summary in leaf nodes.

Formatting codes for the text argument. Also used by labelnode and labelvar.

\n line break
*...* italics
**...** bold
^...^ superscript (using 10 point font)
~...~ subscript (using 10 point font)
%%red ...%% display text in red (or whichever color is specified)

Examples

Run this code

# NOT RUN {
# A single-level hierarchy
vtree(FakeData,"Severity")

# A two-level hierarchy
vtree(FakeData,"Severity Sex")

# A two-level hierarchy with pruning of some values of Severity
vtree(FakeData,"Severity Sex",prune=list("Severity"=c("Moderate","NA")))

# Rename some nodes
vtree(FakeData,"Severity Sex",labelnode=list(Sex=(c("Male"="M","Female"="F"))))

# }

Run the code above in your browser using DataLab