metacoder (version 0.3.0.1)

heat_tree: Plot a taxonomic tree

Description

Plots the distribution of values associated with a taxonomic classification/heirarchy. Taxonomic classifications can have multiple roots, resulting in multiple trees on the same plot. A tree consists of elements, element properties, conditions, and mapping properties which are represented as parameters in the heat_tree object. The elements (e.g. nodes, edges, lables, and individual trees) are the infrastructure of the heat tree. The element properties (e.g. size and color) are characteristics that are manipulated by various data conditions and mapping properties. The element properties can be explicitly defined or automatically generated. The conditions are data (e.g. taxon statistics, such as abundance) represented in the taxmap/metacoder object. The mapping properties are parameters (e.g. transformations, range, interval, and layout) used to change the elements/element properties and how they are used to represent (or not represent) the various conditions.

Usage

heat_tree(...)

# S3 method for Taxmap heat_tree(.input, ...)

# S3 method for default heat_tree(taxon_id, supertaxon_id, node_label = NA, edge_label = NA, tree_label = NA, node_size = 1, edge_size = node_size, node_label_size = node_size, edge_label_size = edge_size, tree_label_size = as.numeric(NA), node_color = "#999999", edge_color = node_color, tree_color = NA, node_label_color = "#000000", edge_label_color = "#000000", tree_label_color = "#000000", node_size_trans = "area", edge_size_trans = node_size_trans, node_label_size_trans = node_size_trans, edge_label_size_trans = edge_size_trans, tree_label_size_trans = "area", node_color_trans = "area", edge_color_trans = node_color_trans, tree_color_trans = "area", node_label_color_trans = "area", edge_label_color_trans = "area", tree_label_color_trans = "area", node_size_range = c(NA, NA), edge_size_range = c(NA, NA), node_label_size_range = c(NA, NA), edge_label_size_range = c(NA, NA), tree_label_size_range = c(NA, NA), node_color_range = quantative_palette(), edge_color_range = node_color_range, tree_color_range = quantative_palette(), node_label_color_range = quantative_palette(), edge_label_color_range = quantative_palette(), tree_label_color_range = quantative_palette(), node_size_interval = range(node_size, na.rm = TRUE, finite = TRUE), node_color_interval = NULL, edge_size_interval = range(edge_size, na.rm = TRUE, finite = TRUE), edge_color_interval = NULL, node_label_max = 500, edge_label_max = 500, tree_label_max = 500, overlap_avoidance = 1, margin_size = c(0, 0, 0, 0), layout = "reingold-tilford", initial_layout = "fruchterman-reingold", make_node_legend = TRUE, make_edge_legend = TRUE, title = NULL, title_size = 0.08, node_color_axis_label = NULL, node_size_axis_label = NULL, edge_color_axis_label = NULL, edge_size_axis_label = NULL, background_color = "#FFFFFF00", output_file = NULL, aspect_ratio = 1, repel_labels = TRUE, repel_force = 1, repel_iter = 1000, verbose = FALSE, ...)

Arguments

...

(other named arguments) Passed to the igraph layout function used.

.input

An object of type taxmap

taxon_id

The unique ids of taxa.

supertaxon_id

The unique id of supertaxon taxon_id is a part of.

node_label

See details on labels. Default: no labels.

edge_label

See details on labels. Default: no labels.

tree_label

See details on labels. The label to display above each graph. The value of the root of each graph will be used. Default: None.

node_size

See details on size. Default: constant size.

edge_size

See details on size. Default: relative to node size.

node_label_size

See details on size. Default: relative to vertex size.

edge_label_size

See details on size. Default: relative to edge size.

tree_label_size

See details on size. Default: relative to graph size.

node_color

See details on colors. Default: grey.

edge_color

See details on colors. Default: same as node color.

tree_color

See details on colors. The value of the root of each graph will be used. Overwrites the node and edge color if specified. Default: Not used.

node_label_color

See details on colors. Default: black.

edge_label_color

See details on colors. Default: black.

tree_label_color

See details on colors. Default: black.

node_size_trans

See details on transformations. Default: "area".

edge_size_trans

See details on transformations. Default: same as node_size_trans.

node_label_size_trans

See details on transformations. Default: same as node_size_trans.

edge_label_size_trans

See details on transformations. Default: same as edge_size_trans.

tree_label_size_trans

See details on transformations. Default: "area".

node_color_trans

See details on transformations. Default: "area".

edge_color_trans

See details on transformations. Default: same as node color transformation.

tree_color_trans

See details on transformations. Default: "area".

node_label_color_trans

See details on transformations. Default: "area".

edge_label_color_trans

See details on transformations. Default: "area".

tree_label_color_trans

See details on transformations. Default: "area".

node_size_range

See details on ranges. Default: Optimize to balance overlaps and range size.

edge_size_range

See details on ranges. Default: relative to node size range.

node_label_size_range

See details on ranges. Default: relative to node size.

edge_label_size_range

See details on ranges. Default: relative to edge size.

tree_label_size_range

See details on ranges. Default: relative to tree size.

node_color_range

See details on ranges. Default: Color-blind friendly palette.

edge_color_range

See details on ranges. Default: same as node color.

tree_color_range

See details on ranges. Default: Color-blind friendly palette.

node_label_color_range

See details on ranges. Default: Color-blind friendly palette.

edge_label_color_range

See details on ranges. Default: Color-blind friendly palette.

tree_label_color_range

See details on ranges. Default: Color-blind friendly palette.

node_size_interval

See details on intervals. Default: The range of values in node_size.

node_color_interval

See details on intervals. Default: The range of values in node_color.

edge_size_interval

See details on intervals. Default: The range of values in edge_size.

edge_color_interval

See details on intervals. Default: The range of values in edge_color.

node_label_max

The maximum number of node labels. Default: 20.

edge_label_max

The maximum number of edge labels. Default: 20.

tree_label_max

The maximum number of tree labels. Default: 20.

overlap_avoidance

(numeric) The relative importance of avoiding overlaps vs maximizing size range. Higher numbers will cause node size optimization to avoid overlaps more. Default: 1.

margin_size

(numeric of length 2) The horizontal and vertical margins. c(left, right, bottom, top). Default: 0, 0, 0, 0.

layout

The layout algorithm used to position nodes. See details on layouts. Default: "reingold-tilford".

initial_layout

he layout algorithm used to set the initial position of nodes, passed as input to the layout algorithm. See details on layouts. Default: Not used.

make_node_legend

if TRUE, make legend for node size/color mappings.

make_edge_legend

if TRUE, make legend for edge size/color mappings.

title

Name to print above the graph.

title_size

The size of the title relative to the rest of the graph.

node_color_axis_label

The label on the scale axis corresponding to node_color. Default: The expression given to node_color.

node_size_axis_label

The label on the scale axis corresponding to node_size. Default: The expression given to node_size.

edge_color_axis_label

The label on the scale axis corresponding to edge_color. Default: The expression given to edge_color.

edge_size_axis_label

The label on the scale axis corresponding to edge_size. Default: The expression given to edge_size.

background_color

The background color of the plot. Default: Transparent

output_file

The path to one or more files to save the plot in using ggsave. The type of the file will be determined by the extension given. Default: Do not save plot.

aspect_ratio

The aspect_ratio of the plot.

repel_labels

If TRUE (Default), use the ggrepel package to spread out labels.

repel_force

The force of which overlapping labels will be repelled from eachother.

repel_iter

The number of iterations used when repelling labels

verbose

If TRUE print progress reports as the function runs.

labels

The labels of nodes, edges, and trees can be added. Node labels are centered over their node. Edge labels are displayed over edges, in the same orientation. Tree labels are displayed over their tree.

Accepts a vector, the same length taxon_id or a factor of its length.

sizes

The size of nodes, edges, labels, and trees can be mapped to various conditions. This is useful for displaying statistics for taxa, such as abundance. Only the relative size of the condition is used, not the values themselves. The <element>_size_trans (transformation) parameter can be used to make the size mapping non-linear. The <element>_size_range parameter can be used to proportionately change the size of an element based on the condition mapped to that element. The <element>_size_interval parameter can be used to change the limit at which a condition will be graphically represented as the same size as the minimum/maximum <element>_size_range.

Accepts a numeric vector, the same length taxon_id or a factor of its length.

colors

The colors of nodes, edges, labels, and trees can be mapped to various conditions. This is useful for visually highlighting/clustering groups of taxa. Only the relative size of the condition is used, not the values themselves. The <element>_color_trans (transformation) parameter can be used to make the color mapping non-linear. The <element>_color_range parameter can be used to proportionately change the color of an element based on the condition mapped to that element. The <element>_color_interval parameter can be used to change the limit at which a condition will be graphically represented as the same color as the minimum/maximum <element>_color_range.

Accepts a vector, the same length taxon_id or a factor of its length. If a numeric vector is given, it is mapped to a color scale. Hex values or color names can be used (e.g. #000000 or "black").

Mapping Properties

transformations

Before any conditions specified are mapped to an element property (color/size), they can be transformed to make the mapping non-linear. Any of the transformations listed below can be used by specifying their name. A customized function can also be supplied to do the transformation.

"linear"

Proportional to radius/diameter of node

"area"

circular area; better perceptual accuracy than "linear"

"log10"

Log base 10 of radius

"log2"

Log base 2 of radius

"ln"

Log base e of radius

"log10 area"

Log base 10 of circular area

"log2 area"

Log base 2 of circular area

"ln area"

Log base e of circular area

ranges

The displayed range of colors and sizes can be explicitly defined or automatically generated. When explicitly used, the size range will proportionately increase/decrease the size of a particular element. Size ranges are specified by supplying a numeric vector with two values: the minimum and maximum. The units used should be between 0 and 1, representing the proportion of a dimension of the graph. Since the dimensions of the graph are determined by layout, and not always square, the value that 1 corresponds to is the square root of the graph area (i.e. the side of a square with the same area as the plotted space). Color ranges can be any number of color values as either HEX codes (e.g. #000000) or color names (e.g. "black").

layout

Layouts determine the position of node elements on the graph. They are implemented using the igraph package. Any additional arguments passed to heat_tree are passed to the igraph function used. The following character values are understood:

"automatic"

Use nicely. Let igraph choose the layout.

"reingold-tilford"

Use as_tree. A circular tree-like layout.

"davidson-harel"

Use with_dh. A type of simulated annealing.

"gem"

Use with_gem. A force-directed layout.

"graphopt"

Use with_graphopt. A force-directed layout.

"mds"

Use with_mds. Multidimensional scaling.

"fruchterman-reingold"

Use with_fr. A force-directed layout.

"kamada-kawai"

Use with_kk. A layout based on a physical model of springs.

"large-graph"

Use with_lgl. Meant for larger graphs.

"drl"

Use with_drl. A force-directed layout.

intervals

This is the minimum and maximum of values displayed on the legend scales. Intervals are specified by supplying a numeric vector with two values: the minimum and maximum. When explicitly used, the <element>_<property>_interval will redefine the way the actual conditional values are being represented by setting a limit for the <element>_<property>. Any condition below the minimum <element>_<property>_interval will be graphically represented the same as a condition AT the minimum value in the full range of conditional values. Any value above the maximum <element>_<property>_interval will be graphically represented the same as a value AT the maximum value in the full range of conditional values. By default, the minimum and maximum equals the <element>_<property>_range used to infer the value of the <element>_<property>. Setting a custom interval is useful for making <element>_<properties> in multiple graphs correspond to the same conditions, or setting logical boundaries (such as c(0,1) for proportions. Note that this is different from the <element>_<property>_range mapping property, which determines the size/color of graphed elements.

Acknowledgements

This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel because we are using internal functions to ggrepel that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.

Examples

Run this code
# NOT RUN {
# Parse dataset for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "info", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")
                   
# Default appearance:
#  No parmeters are needed, but the default tree is not too useful
heat_tree(x)

# A good place to start:
#  There will always be "taxon_names" and "n_obs" variables, so this is a 
#  good place to start. This will shown the number of OTUs in this case. 
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs)

# Plotting read depth:
#  To plot read depth, you first need to add up the number of reads per taxon.
#  The function `calc_taxon_abund` is good for this. 
x$data$taxon_counts <- calc_taxon_abund(x, data = "tax_data")
x$data$taxon_counts$total <- rowSums(x$data$taxon_counts[, -1]) # -1 = taxon_id column
heat_tree(x, node_label = taxon_names, node_size = total, node_color = total)

# Plotting multiple variables:
#  You can plot up to 4 quantative variables use node/edge size/color, but it
#  is usually best to use 2 or 3. The plot below uses node size for number of
#  OTUs and color for number of reads and edge size for number of samples
x$data$n_samples <- calc_n_samples(x, data = "taxon_counts")
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total,
          edge_color = n_samples)

# Different layouts:
#  You can use any layout implemented by igraph. You can also specify an
#  initial layout to seed the main layout with.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          layout = "davidson-harel")
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          layout = "davidson-harel", initial_layout = "reingold-tilford")

# Axis labels:
#  You can add custom labeles to the legends
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total,
          edge_color = n_samples, node_size_axis_label = "Number of OTUs", 
          node_color_axis_label = "Number of reads",
          edge_color_axis_label = "Number of samples")
          
# Overlap avoidance:
#  You can change how much node overlap avoidance is used.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          overlap_avoidance = .5)
          
# Label overlap avoidance
#  You can modfiy how label scattering is handled using the `replel_force` and
`repel_iter` options. You can turn off label scattering using the `repel_labels` option.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          repel_force = 2, repel_iter = 20000)
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          repel_labels = FALSE)

# Setting the size of graph elements: 
#  You can force nodes, edges, and lables to be a specific size/color range instead
#  of letting the function optimize it. These options end in `_range`.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_range = c(0.01, .1))
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          edge_color_range = c("black", "#FFFFFF"))
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_label_size_range = c(0.02, 0.02))

# Setting the transformation used:
#  You can change how raw statistics are converted to color/size using options
#  ending in _trans.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_trans = "log10 area")

# Setting the interval displayed:
#  By default, the whole range of the statistic provided will be displayed.
#  You can set what range of values are displayed using options ending in `_interval`.
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs,
          node_size_interval = c(10, 100))

# }

Run the code above in your browser using DataCamp Workspace