plot_richness: Plot alpha diversity, flexibly with ggplot2

Description

There are many useful examples of alpha-diversity graphics in the http://joey711.github.io/phyloseq/plot_richness-examples{phyloseq online tutorials}. This function estimates a number of alpha-diversity metrics using the estimate_richness function, and returns a ggplot plotting object. The plot generated by this function will include every sample in physeq, but they can be further grouped on the horizontal axis through the argument to x, and shaded according to the argument to color (see below). You must use untrimmed, non-normalized count data for meaningful results, as many of these estimates are highly dependent on the number of singletons. You can always trim the data later on if needed, just not before using this function.

Usage

plot_richness(physeq, x = "samples", color = NULL, shape = NULL,
  title = NULL, scales = "free_y", nrow = 1, shsi = NULL,
  measures = NULL, sortby = NULL)

Arguments

physeq

(Required). phyloseq-class, or alternatively, an otu_table-class. The data about which you want to estimate.

(Optional). A variable to map to the horizontal axis. The vertical axis will be mapped to the alpha diversity index/estimate and have units of total taxa, and/or index value (dimensionless). This parameter (x) can be either a character string indicating a variable in sample_data (among the set returned by sample_variables(physeq) ); or a custom supplied vector with length equal to the number of samples in the dataset (nsamples(physeq)).

The default value is "samples", which will map each sample's name to a separate horizontal position in the plot.

color

(Optional). Default NULL. The sample variable to map to different colors. Like x, this can be a single character string of the variable name in sample_data (among the set returned by sample_variables(physeq) ); or a custom supplied vector with length equal to the number of samples in the dataset (nsamples(physeq)). The color scheme is chosen automatically by link{ggplot}, but it can be modified afterward with an additional layer using scale_color_manual.

shape

(Optional). Default NULL. The sample variable to map to different shapes. Like x and color, this can be a single character string of the variable name in sample_data (among the set returned by sample_variables(physeq) ); or a custom supplied vector with length equal to the number of samples in the dataset (nsamples(physeq)). The shape scale is chosen automatically by link{ggplot}, but it can be modified afterward with an additional layer using scale_shape_manual.

title

(Optional). Default NULL. Character string. The main title for the graphic.

scales

(Optional). Default "free_y". Whether to let vertical axis have free scale that adjusts to the data in each panel. This argument is passed to facet_wrap. If set to "fixed", a single vertical scale will be used in all panels. This can obscure values if the measures argument includes both richness estimates and diversity indices, for example.

nrow

(Optional). Default is 1, meaning that all plot panels will be placed in a single row, side-by-side. This argument is passed to facet_wrap. If NULL, the number of rows and columns will be chosen automatically (wrapped) based on the number of panels and the size of the graphics device.

shsi

(Deprecated). No longer supported. Instead see `measures` below.

measures

(Optional). Default is NULL, meaning that all available alpha-diversity measures will be included in plot panels. Alternatively, you can specify one or more measures as a character vector of measure names. Values must be among those supported: c("Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson", "Fisher").

sortby

(Optional). A character string subset of measures argument. Sort x-indices by the mean of one or more measures, if x-axis is mapped to a discrete variable. Default is NULL, implying that a discrete-value horizontal axis will use default sorting, usually alphabetic.

Value

A ggplot plot object summarizing the richness estimates, and their standard error.

Details

NOTE: Because this plotting function incorporates the output from estimate_richness, the variable names of that output should not be used as x or color (even if it works, the resulting plot might be kindof strange, and not the intended behavior of this function). The following are the names you will want to avoid using in x or color:

c("Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson", "Fisher").

Examples

Run this code

## There are many more interesting examples at the phyloseq online tutorials.
## http://joey711.github.io/phyloseq/plot_richness-examples
data("soilrep")
plot_richness(soilrep, measures=c("InvSimpson", "Fisher"))
plot_richness(soilrep, "Treatment", "warmed", measures=c("Chao1", "ACE", "InvSimpson"), nrow=3)
data("GlobalPatterns")
plot_richness(GlobalPatterns, x="SampleType", measures=c("InvSimpson"))
plot_richness(GlobalPatterns, x="SampleType", measures=c("Chao1", "ACE", "InvSimpson"), nrow=3)
plot_richness(GlobalPatterns, x="SampleType", measures=c("Chao1", "ACE", "InvSimpson"), nrow=3, sortby = "Chao1")

Run the code above in your browser using DataLab