metacoder (version 0.2.0)

heat_tree: Plot a taxonomic tree

Description

Plots the distribution of values associated with a taxonomic classification. Taxonomic classifications can have multiple roots, resulting in multiple trees on the same plot. Sizes and colors of nodes, edges, labels, and individual trees can be displayed relative to numbers (e.g. taxon statistics, such as abundance). The displayed range of colors and sizes can be explicitly defined or automatically generated. Various transformations can be applied to numbers sizes/colors are mapped to. Several types of tree layout algorithms from igraph can be used.

Usage

heat_tree(...)

# S3 method for Taxmap heat_tree(.input, ...)

# S3 method for default heat_tree(taxon_id, supertaxon_id, node_label = NA, edge_label = NA, tree_label = NA, node_size = 1, edge_size = node_size, node_label_size = node_size, edge_label_size = edge_size, tree_label_size = as.numeric(NA), node_color = "#999999", edge_color = node_color, tree_color = NA, node_label_color = "#000000", edge_label_color = "#000000", tree_label_color = "#000000", node_size_trans = "area", edge_size_trans = node_size_trans, node_label_size_trans = node_size_trans, edge_label_size_trans = edge_size_trans, tree_label_size_trans = "area", node_color_trans = "area", edge_color_trans = node_color_trans, tree_color_trans = "area", node_label_color_trans = "area", edge_label_color_trans = "area", tree_label_color_trans = "area", node_size_range = c(NA, NA), edge_size_range = c(NA, NA), node_label_size_range = c(NA, NA), edge_label_size_range = c(NA, NA), tree_label_size_range = c(NA, NA), node_color_range = quantative_palette(), edge_color_range = node_color_range, tree_color_range = quantative_palette(), node_label_color_range = quantative_palette(), edge_label_color_range = quantative_palette(), tree_label_color_range = quantative_palette(), node_size_interval = range(node_size, na.rm = TRUE, finite = TRUE), node_color_interval = NULL, edge_size_interval = range(edge_size, na.rm = TRUE, finite = TRUE), edge_color_interval = NULL, node_label_max = 500, edge_label_max = 500, tree_label_max = 500, overlap_avoidance = 1, margin_size = c(0, 0, 0, 0), layout = "reingold-tilford", initial_layout = "fruchterman-reingold", make_legend = TRUE, title = NULL, title_size = 0.08, node_color_axis_label = NULL, node_size_axis_label = NULL, edge_color_axis_label = NULL, edge_size_axis_label = NULL, background_color = "#FFFFFF00", output_file = NULL, aspect_ratio = 1, repel_labels = TRUE, repel_force = 1, repel_iter = 1000, verbose = FALSE, ...)

Arguments

...

(other named arguments) Passed to the igraph layout function used.

.input

An object of type taxmap

taxon_id

The unique ids of taxa.

supertaxon_id

The unique id of supertaxon taxon_id is a part of.

node_label

See details on labels. Default: no labels.

edge_label

See details on labels. Default: no labels.

tree_label

See details on labels. The label to display above each graph. The value of the root of each graph will be used. Default: None.

node_size

See details on size. Default: constant size.

edge_size

See details on size. Default: relative to node size.

node_label_size

See details on size. Default: relative to vertex size.

edge_label_size

See details on size. Default: relative to edge size.

tree_label_size

See details on size. Default: relative to graph size.

node_color

See details on colors. Default: grey.

edge_color

See details on colors. Default: same as node color.

tree_color

See details on colors. The value of the root of each graph will be used. Overwrites the node and edge color if specified. Default: Not used.

node_label_color

See details on colors. Default: black.

edge_label_color

See details on colors. Default: black.

tree_label_color

See details on colors. Default: black.

node_size_trans

See details on transformations. Default: "area".

edge_size_trans

See details on transformations. Default: same as node_size_trans.

node_label_size_trans

See details on transformations. Default: same as node_size_trans.

edge_label_size_trans

See details on transformations. Default: same as edge_size_trans.

tree_label_size_trans

See details on transformations. Default: "area".

node_color_trans

See details on transformations. Default: "area".

edge_color_trans

See details on transformations. Default: same as node color transformation.

tree_color_trans

See details on transformations. Default: "area".

node_label_color_trans

See details on transformations. Default: "area".

edge_label_color_trans

See details on transformations. Default: "area".

tree_label_color_trans

See details on transformations. Default: "area".

node_size_range

See details on ranges. Defualt: Optimize to balance overlaps and range size.

edge_size_range

See details on ranges. Default: relative to node size range.

node_label_size_range

See details on ranges. Default: relative to node size.

edge_label_size_range

See details on ranges. Default: relative to edge size.

tree_label_size_range

See details on ranges. Default: relative to tree size.

node_color_range

See details on ranges. Default: Color-blind friendly palette.

edge_color_range

See details on ranges. Default: same as node color.

tree_color_range

See details on ranges. Default: Color-blind friendly palette.

node_label_color_range

See details on ranges. Default: Color-blind friendly palette.

edge_label_color_range

See details on ranges. Default: Color-blind friendly palette.

tree_label_color_range

See details on ranges. Default: Color-blind friendly palette.

node_size_interval

See details on intervals. Default: The range of values in node_size.

node_color_interval

See details on intervals. Default: The range of values in node_color.

edge_size_interval

See details on intervals. Default: The range of values in edge_size.

edge_color_interval

See details on intervals. Default: The range of values in edge_color.

node_label_max

The maximum number of node labels. Default: 20.

edge_label_max

The maximum number of edge labels. Default: 20.

tree_label_max

The maximum number of tree labels. Default: 20.

overlap_avoidance

(numeric) The relative importance of avoiding overlaps vs maximizing size range. Higher numbers will cause node size optimization to avoid overlaps more. Default: 1.

margin_size

(numeric of length 2) The horizontal and vertical margins. c(left, right, bottom, top). Default: 0, 0, 0, 0.

layout

The layout algorithm used to position nodes. See details on layouts. Default: "reingold-tilford".

initial_layout

he layout algorithm used to set the initial position of nodes, passed as input to the layout algorithm. See details on layouts. Default: Not used.

make_legend

if TRUE...

title

Name to print above the graph.

title_size

The size of the title relative to the rest of the graph.

node_color_axis_label

The label on the scale axis corresponding to node_color. Default: The expression given to node_color.

node_size_axis_label

The label on the scale axis corresponding to node_size. Default: The expression given to node_size.

edge_color_axis_label

The label on the scale axis corresponding to edge_color. Default: The expression given to edge_color.

edge_size_axis_label

The label on the scale axis corresponding to edge_size. Default: The expression given to edge_size.

background_color

The background color of the plot. Default: Transparent

output_file

The path to one or more files to save the plot in using ggsave. The type of the file will be determined by the extension given. Default: Do not save plot.

aspect_ratio

The aspect_ratio of the plot.

repel_labels

If TRUE (Defualt), use the ggrepel package to spread out labels.

repel_force

The force of which overlapping labels will be repelled from eachother.

repel_iter

The number of iterations used when repelling labels

verbose

If TRUE print progress reports as the function runs.

size

The size of nodes, edges, labels, and trees can be mapped to arbitrary numbers. This is useful for displaying statistics for taxa, such as abundance. Only the relative size of numbers is used, not the values themselves. They can be transformed to make the mapping non-linear using the transformation options. The range of actual sizes displayed on the graph can be set using the range options.

Accepts a numeric vector, the same length taxon_id or a factor of its length.

colors

The colors of nodes, edges, labels, and trees can be mapped to arbitrary numbers. This is useful for highlighting groups of taxa. Only the relative size of numbers is used, not the values themselves. They can be transformed to make the mapping non-linear using the transformation options. The range of actual colors displayed on the graph can be set using the range options.

Accepts a vector, the same length taxon_id or a factor of its length. If a numeric vector is given, it is mapped to a color scale. Hex values or color names can be used (e.g. #000000 or "black").

Labels

The labels of nodes, edges, and trees can be added. Node labels are centered over their node. Edge labels are displayed over edges, in the same orientation. Tree labels are displayed over their tree.

Accepts a vector, the same length taxon_id or a factor of its length.

Transformations

Before any numbers specified are mapped to color/size, they can be transformed to make the mapping non-linear. Any of the transformations listed below can be used by specifying their name. A customized function can also be supplied to do the transformation.

"linear"

Proportional to radius/diameter of node

"area"

circular area; better perceptual accuracy than "linear"

"log10"

Log base 10 of radius

"log2"

Log base 2 of radius

"ln"

Log base e of radius

"log10 area"

Log base 10 of circular area

"log2 area"

Log base 2 of circular area

"ln area"

Log base e of circular area

Ranges

The displayed range of colors and sizes can be explicitly defined or automatically generated. Size ranges are specified by supplying a numeric vector with two values: the minimum and maximum. The units used should be between 0 and 1, representing the proportion of a dimension of the graph. Since the dimensions of the graph are determined by layout, and not always square, the value that 1 corresponds to is the square root of the graph area (i.e. the side of a square with the same area as the plotted space). Color ranges can be any number of color values as either HEX codes (e.g. #000000) or color names (e.g. "black").

Layout

Layouts determine the position of nodes on the graph. The are implemented using the igraph package. Any additional arguments passed to heat_tree are passed to the igraph function used. The following character values are understood:

"automatic"

Use nicely. Let igraph choose the layout.

"reingold-tilford"

Use as_tree. A circular tree-like layout.

"davidson-harel"

Use with_dh. A type of simulated annealing.

"gem"

Use with_gem. A force-directed layout.

"graphopt"

Use with_graphopt. A force-directed layout.

"mds"

Use with_mds. Multidimensional scaling.

"fruchterman-reingold"

Use with_fr. A force-directed layout.

"kamada-kawai"

Use with_kk. A layout based on a physical model of springs.

"large-graph"

Use with_lgl. Meant for larger graphs.

"drl"

Use with_drl. A force-directed layout.

Intervals

This is the minimum and maximum of values displayed on the legend scales. Intervals are specified by supplying a numeric vector with two values: the minimum and maximum. These are defined in the same units as element size/color. By default, the minimum and maximum equals the range of the values used to infer size/color. Setting a custom interval is useful for making size/color in multiple graphs correspond to the same statistics, or setting logical boundaries (such as c(0,1) for proportions. Note that this is different from the "range" options, which determine the range of graphed sizes/colors.

Acknowledgements

This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel because we are using internal functions to ggrepel that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.

Examples

Run this code
# NOT RUN {
# Parse dataset for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "info", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")
                   
# Default appearance:
#  No parmeters are needed, but the default tree is not too useful
heat_tree(x)

# A good place to start:
#  There will always be "taxon_names" and "n_obs" variables, so this is a 
#  good place to start. This will shown the number of OTUs in this case. 
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs)

# Plotting read depth:
#  To plot read depth, you first need to add up the number of reads per taxon.
#  The function `calc_taxon_abund` is good for this. 
x$data$taxon_counts <- calc_taxon_abund(x, dataset = "tax_data")
x$data$taxon_counts$total <- rowSums(x$data$taxon_counts[, -1]) # -1 = taxon_id column
heat_tree(x, node_label = taxon_names, node_size = total, node_color = total)

# Plotting multiple variables:
#  You can plot up to 4 quantative variables use node/edge size/color, but it
#  is usually best to use 2 or 3. The plot below uses node size for number of
#  OTUs and color for number of reads and edge size for number of samples
x$data$taxon_counts <- calc_n_samples(x, dataset = "taxon_counts", append = TRUE)
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total,
          edge_color = n_samples)

# Different layouts:
#  You can use any layout implemented by igraph. You can also specify an
#  initial layout to seed the main layout with.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          layout = "davidson-harel")
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          layout = "davidson-harel", initial_layout = "reingold-tilford")

# Axis labels:
#  You can add custom labeles to the legends
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total,
          edge_color = n_samples, node_size_axis_label = "Number of OTUs", 
          node_color_axis_label = "Number of reads",
          edge_color_axis_label = "Number of samples")
          
# Overlap avoidance:
#  You can change how much node overlap avoidance is used.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          overlap_avoidance = .5)
          
# Label overlap avoidance
#  You can modfiy how label scattering is handled using the `replel_force` and
`repel_iter` options. You can turn off label scattering using the `repel_labels` option.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          repel_force = 2, repel_iter = 20000)
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          repel_labels = FALSE)

# Setting the size of graph elements: 
#  You can force nodes, edges, and lables to be a specific size/color range instead
#  of letting the function optimize it. These options end in `_range`.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_range = c(0.01, .1))
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          edge_color_range = c("black", "#FFFFFF"))
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_label_size_range = c(0.02, 0.02))

# Setting the transformation used:
#  You can change how raw statistics are converted to color/size using options
#  ending in _trans.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_trans = "log10 area")

# Setting the interval displayed:
#  By default, the whole range of the statistic provided will be displayed.
#  You can set what range of values are displayed using options ending in `_interval`.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_interval = c(10, 100))

# }

Run the code above in your browser using DataLab