Learn R Programming

⚠️There's a newer version (0.3.8) of this package.Take me there.

An R package for metabarcoding research planning and analysis

Metacoder is an R package for reading, plotting, and manipulating large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called “heat trees” used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the taxmap format defined by the taxa package, such as:

  • Summing read counts/abundance per taxon
  • Converting counts to proportions and rarefaction of counts using vegan
  • Comparing the abundance (or other characteristics) of groups of samples (e.g., experimental treatments) per taxon
  • Combining data for groups of samples
  • Simulated PCR, via EMBOSS primersearch, for testing primer specificity and coverage of taxonomic groups
  • Converting common microbiome formats for data and reference databases into the objects defined by the taxa package.
  • Converting to and from the phyloseq format and the taxa format

Installation

This project is available on CRAN and can be installed like so:

install.packages("metacoder")

You can also install the development version for the newest features, bugs, and bug fixes:

install.packages("devtools")
devtools::install_github("grunwaldlab/metacoder")

Documentation

All the documentation for metacoder can be found on our website here:

https://grunwaldlab.github.io/metacoder_documentation/

Dependencies

The function that simulates PCR requires primersearch from the EMBOSS tool kit to be installed. This is not an R package, so it is not automatically installed. Type ?primersearch after installing and loading metacoder for installation instructions.

Relationship with other packages

Many of these operations can be done using other packages like phyloseq, which also provides tools for diversity analysis. The main strength of metacoder is that its functions use the flexible data types defined by taxa, which has powerful parsing and subsetting abilities that take into account the hierarchical relationship between taxa and user-defined data. In general, metacoder and taxa are more of an abstracted tool kit, whereas phyloseq has more specialized functions for community diversity data, but they both can do similar things. I encourage you to try both to see which fits your needs and style best. You can also combine the two in a single analysis by converting between the two data types when needed.

Citation

If you use metcoder in a publication, please cite our article in PLOS Computational Biology:

Foster ZSL, Sharpton TJ, Grünwald NJ (2017) Metacoder: An R package for visualization and manipulation of community taxonomic diversity data. PLOS Computational Biology 13(2): e1005404. https://doi.org/10.1371/journal.pcbi.1005404

Future development

Metacoder is under active development and many new features are planned. Some improvements that are being explored include:

  • Barcoding gap analysis and associated plotting functions
  • A function to aid in retrieving appropriate sequence data from NCBI for in silico PCR from whole genome sequences
  • Graphing of different node shapes in heat trees, possibly including pie graphs or PhyloPics.
  • Adding the ability to plot specific edge lengths in the heat trees so they can be used for phylogenetic trees.
  • Adding more data import and export functions to make parsing and writing common formats easier.

To see the details of what is being worked on, check out the issues tab of the Metacoder Github site.

License

This work is subject to the MIT License.

Acknowledgements

Metacoder’s major dependencies are taxa, taxize, vegan, igraph, dplyr, and ggplot2.

This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel because we are using functions internal to ggrepel that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.

Feedback and contributions

We would like to hear about users’ thoughts on the package and any errors they run into. Please report errors, questions or suggestions on the issues tab of the Metacoder Github site. We also welcome contributions via a Github pull request. You can also talk with us using our Google groups site.

Copy Link

Version

Install

install.packages('metacoder')

Monthly Downloads

86

Version

0.3.5

License

GPL-2 | GPL-3

Maintainer

Zachary Foster

Last Published

June 23rd, 2021

Functions in metacoder (0.3.5)

all_names

Return names of data in [taxonomy()] or [taxmap()]
ambiguous_synonyms

Get patterns for ambiguous taxa
arrange_taxa

Sort the edge list of [taxmap()] objects
ambiguous_patterns

Get patterns for ambiguous taxa
arrange_obs

Sort user data in [taxmap()] objects
all_functions

Get list of usable functions
add_alpha

add_alpha
as_id

Convert a vector to database IDs
apply_color_scale

Covert numbers to colors
DNAbin_to_char

Converts DNAbin to a named character vector
as_phyloseq

Convert taxmap to phyloseq
calc_group_median

Calculate medians of groups of columns
calc_group_mean

Calculate means of groups of columns
check_element_length

Check length of graph attributes
check_for_pkg

check for packages
calc_group_stat

Apply a function to groups of columns
compare_groups

Compare groups of samples
counts_to_presence

Apply a function to groups of columns
classifications

Get classifications of taxa
calc_group_rsd

Relative standard deviations of groups of columns
count_capture_groups

Count capture groups
diverging_palette

The default diverging color palette
desc_font

Description formatting in print methods
convert_base

Converts decimal numbers to other bases
branches

Get "branch" taxa
check_option_groups

Check option: groups
complement

Find complement of sequences
do_calc_on_num_cols

Run some function to produce new columns.
.onAttach

Run when package loads
contains

dplyr select_helpers
check_taxmap_data

Check dataset format
filter_obs

Filter observations with a list of conditions
edge_list_depth

Get distance from root of edgelist observations
ends_with

dplyr select_helpers
filter_taxa

Filter taxa with a list of conditions
get_data

Get data in a taxmap object by name
get_data_frame

Get data in a taxonomy or taxmap object by name
ex_hierarchy2

An example Hierarchy object
filtering-helpers

Taxonomic filtering helpers
ex_hierarchy3

An example Hierarchy object
correct_taxon_names

Look up official names from potentially misspelled names
get_class_from_el

Get classification for taxa in edge list
get_numerics

Return numeric values in a character
get_dataset

Get a data set from a taxmap object
get_database_name

Return name of database
get_node_children

get_node_children
get_numeric_cols

Get numeric columns from taxmap table
get_taxmap_other_cols

Parse the other_cols option
get_taxmap_data

Get a data set from a taxmap object
can_be_num

Test if characters can be converted to numbers
can_be_used_in_taxmap

Check that a unknown object can be used with taxmap
calc_taxon_abund

Sum observation values for each taxon
calc_prop_samples

Calculate the proportion of samples
is_ambiguous

Find ambiguous taxon names
data_used

Get values of data used in expressions
error_font

Font to indicate an error
database_list

Database list
get_optimal_range

Find optimal range
everything

dplyr select_helpers
inverse

Generate the inverse of a function
is_leaf

Test if taxa are leaves
calc_n_samples

Count the number of samples
fasta_headers

Get line numbers of FASTA headers
num_range

dplyr select_helpers
leaves

Get leaf taxa
get_sort_var

Get a vector from a vector/list/table to be used in mapping
get_taxmap_cols

Get a column subset
make_plot_legend

Make color/size legend
inter_circle_gap

Finds the gap/overlap of circle coordinates
calc_obs_props

Calculate proportions from observation counts
heat_tree_matrix

Plot a matrix of heat trees
look_for_na

Look for NAs in parameters
filter_ambiguous_taxa

Filter ambiguous taxon names
n_leaves

Get number of leaves
get_expected_data

Get a data set in as_phyloseq
check_class_col

Check for name/index in input data
get_edge_parents

get_edge_parents
internodes

Get "internode" taxa
capitalize

Capitalize
hmp_samples

Sample information for HMP subset
hierarchies

Make a set of many [hierarchy()] class objects
is_branch

Test if taxa are branches
rev_comp

Revere complement sequences
parse_summary_seqs

Parse summary.seqs output
hmp_otus

A HMP subset
parse_greengenes

Parse Greengenes release
n_subtaxa

Get number of subtaxa
parse_heirarchies_to_taxonomy

Infer edge list from hierarchies
is_root

Test if taxa are roots
ex_taxmap

An example taxmap object
ex_hierarchy1

An example Hierarchy object
ex_hierarchies

An example hierarchies object
names_used

Get names of data used in expressions
make_fasta_with_u_replaced

Make a temporary file U's replaced with T
multi_sep_split

Like `strsplit`, but with multiple separators
lookup_tax_data

Convert one or more data sets to taxmap
rescale

Rescale numeric vector to have specified minimum and maximum.
get_edge_children

get_edge_children
id_classifications

Get ID classifications of taxa
extract_tax_data

Extracts taxonomy info from vectors with regex
get_dots_or_list

Get input from dots or list
n_leaves_1

Get number of leaves
label_bounds

Bounding box coords for labels
is_stem

Test if taxa are stems
hierarchy

The Hierarchy class
ncbi_taxon_sample

Download representative sequences for a taxon
init_taxmap_data

Convert `data` input for Taxamp
%>%

magrittr forward-pipe operator
mutate_obs

Add columns to [taxmap()] objects
print__data.frame

Print a data.frame
quantative_palette

The default quantative color palette
n_obs

Count observations in [taxmap()]
primersearch

Use EMBOSS primersearch for in silico PCR
roots

Get root taxa
get_taxonomy_levels

Get taxonomy levels
get_taxmap_table

Get a table from a taxmap object
line_coords

Makes coordinates for a line
parse_dada2

Convert the output of dada2 to a taxmap object
map_unique

Run a function on unique values of a iterable
name_font

Variable name formatting in print methods
one_of

dplyr select_helpers
matches

dplyr select_helpers
limited_print

Print a subset of a character vector
print__default_

Print method for unsupported
parse_newick

Parse a Newick file
verify_label_count

Verify label count
stems

Get stem taxa
parse_phylo

Parse a phylo object
parse_edge_list

Convert a table with an edge list to taxmap
subtaxa

Get subtaxa
parse_mothur_tax_summary

Parse mothur *.tax.summary Classify.seqs output
heat_tree

Plot a taxonomic tree
parse_possibly_named_logical

used to parse inputs to `drop_obs` and `reassign_obs`
is_internode

Test if taxa are "internodes"
ncbi_sequence

Downloads sequences from ids
parse_mothur_taxonomy

Parse mothur Classify.seqs *.taxonomy output
n_supertaxa_1

Get number of supertaxa
length_of_thing

Check length of thing
prefixed_print

Print a object with a prefix
leaves_apply

Apply function to leaves of each taxon
my_print

Print something
validate_taxmap_funcs

Validate `funcs` input for Taxamp
map_data

Create a mapping between two variables
write_unite_general

Write an imitation of the UNITE general FASTA database
sample_n_taxa

Sample n taxa from [taxonomy()] or [taxmap()]
parse_tax_data

Convert one or more data sets to taxmap
highlight_taxon_ids

Highlight taxon ID column
layout_functions

Layout functions
parse_ubiome

Converts the uBiome file format to taxmap
parse_seq_input

Read sequences in an unknown format
verify_taxmap

Check that an object is a taxmap
parse_silva_fasta

Parse SILVA FASTA release
metacoder

Metacoder
verify_color_range

Verify color range parameters
zero_low_counts

Replace low counts with zero
parse_unite_general

Parse UNITE general release FASTA
scale_bar_coords

Make scale bar division
run_primersearch

Execute EMBOSS Primersearch
split_by_level

Splits a taxonomy at a specific level or rank
print__list

Print a list
print__logical

Print a logical
simplify

List to vector of unique elements
make_dada2_asv_table

Make a imitation of the dada2 ASV abundance matrix
rarefy_obs

Calculate rarefied observation counts
make_dada2_tax_table

Make a imitation of the dada2 taxonomy matrix
parse_phyloseq

Convert a phyloseq to taxmap
sample_frac_taxa

Sample a proportion of taxa from [taxonomy()] or [taxmap()]
molten_dist

Get all distances between points
map_data_

Create a mapping without NSE
punc_font

Punctuation formatting in print methods
sample_frac_obs

Sample a proportion of observations from [taxmap()]
qualitative_palette

The default qualitative color palette
primersearch_raw

Use EMBOSS primersearch for in silico PCR
n_supertaxa

Get number of supertaxa
to_percent

Format a proportion as a printed percent
ranks_ref

Lookup-table for IDs of taxonomic ranks
validate_regex_key_pair

Check a regex-key pair
print__matrix

Print a matrix
taxon_ranks

Get taxon ranks
validate_regex_match

Check that all match input
n_obs_1

Count observation assigned in [taxmap()]
primersearch_is_installed

Test if primersearch is installed
print__numeric

Print a numeric
n_subtaxa_1

Get number of subtaxa
print__tbl_df

Print a tibble
reverse

Reverse sequences
taxon_database

Taxonomy database class
taxonomy

Taxonomy class
verify_trans

Verify transformation function parameters
taxon

Taxon class
taxon_rank

Taxon rank class
verify_size

Verify size parameters
taxon_names

Get taxon names
unique_mapping

get indexes of a unique set of the input
obs

Get data indexes associated with taxa
verify_size_range

Verify size range parameters
obs_apply

Apply function to observations per taxon
read_lines_apply

Apply a function to chunks of a file
print__factor

Print a factor
select_labels

Pick labels to show
parse_dataset

Parse options specifying datasets
supertaxa_apply

Apply function to supertaxa of each taxon
select_obs

Subset columns in a [taxmap()] object
read_fasta

Read a FASTA file
remove_redundant_names

Remove redundant parts of taxon names
print__integer

Print an integer
taxon_indexes

Get taxon indexes
parse_qiime_biom

Parse a BIOM output from QIIME
print__ordered

Print a ordered factor
write_mothur_taxonomy

Write an imitation of the Mothur taxonomy file
parse_primersearch

Parse EMBOSS primersearch output
polygon_coords

Makes coordinates for a regular polygon
progress_lapply

lappy with progress bars
print__vector

Generic vector printer
starts_with

dplyr select_helpers
parse_raw_heirarchies_to_taxonomy

Infer edge list from hierarchies composed of character vectors
write_silva_fasta

Write an imitation of the SILVA FASTA database
taxa-package

taxa
taxon_name

Taxon name class
parse_rdp

Parse RDP FASTA release
tid_font

Taxon id formatting in print methods
sample_n_obs

Sample n observations from [taxmap()]
write_rdp

Write an imitation of the RDP FASTA database
taxa

A class for multiple taxon objects
print_item

Print a item
print__character

Print a character
taxonomy_table

Convert taxonomy info to a table
taxmap

Taxmap class
text_grob_length

Estimate text grob length
transform_data

Transformation functions
print_tree

Print a text tree
replace_taxon_ids

Replace taxon ids
subtaxa_apply

Apply function to subtaxa of each taxon
repo_url

Return github url
supertaxa

Get all supertaxa of a taxon
startup_msg

Return startup message
taxon_id

Taxon ID class
write_greengenes

Write an imitation of the Greengenes database
taxon_ids

Get taxon IDs
transmute_obs

Replace columns in [taxmap()] objects