taxa v0.3.2

0

Monthly downloads

0th

Percentile

Taxonomic Classes

Provides taxonomic classes for groupings of taxonomic names without data, and those with data. Methods provided are "taxonomically aware", in that they know about ordering of ranks, and methods that filter based on taxonomy also filter associated data.

Readme

taxa

Build
Status cran checks codecov Project Status: WIP - Initial development is in progress, but there
has not yet been a stable, usable release suitable for the
public. rstudio mirror
downloads cran
version

taxa defines taxonomic classes and functions to manipulate them. The goal is to use these classes as low level fundamental taxonomic classes that other R packages can build on and use.

There are two distinct types of classes in taxa:

  • Classes that are concerned only with taxonomic information: taxon, taxonomy, hierarchy, etc.
  • A class called taxmap that is concerned with combining taxonomic data with user-defined data of any type (e.g. molecular sequences, abundance counts etc.)

Diagram of class concepts for taxa classes:

Relationship between classes implemented in the taxa package. Diamond-tipped arrows indicate that objects of a one are used in another class. For example, a database object can stored in the taxon_rank, taxon_name, or taxon_id objects. A standard arrow indicates inheritance. For example, the taxmap class inherits the taxonomy class. * means that the object (e.g. a database object) can be replaced by a simple character vector. ? means that the data is optional.

Install

CRAN version

install.packages("taxa")

Development version from GitHub

devtools::install_github("ropensci/taxa")

library("taxa")

The classes

Minor component classes

There are a few optional classes used to store information in other classes. In most cases, these can be replaced with simple character values but using them provides more information and potential functionality.

database

Taxonomic data usually comes from a database. A common example is the NCBI Taxonomy Database used to provide taxonomic classifications to sequences deposited in other NCBI databases. The database class stores the name of the database and associated information:

(ncbi <- taxon_database(
  name = "ncbi",
  url = "http://www.ncbi.nlm.nih.gov/taxonomy",
  description = "NCBI Taxonomy Database",
  id_regex = "*"
))
#> <database> ncbi
#>   url: http://www.ncbi.nlm.nih.gov/taxonomy
#>   description: NCBI Taxonomy Database
#>   id regex: *
ncbi$name
#> [1] "ncbi"
ncbi$url
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy"

To save on memory, a selection of common databases is provided with the package (database_list) and any in this list can be used by name instead of making a new database object (e.g. "ncbi" instead of the ncbi above).

database_list
#> $ncbi
#> <database> ncbi
#>   url: http://www.ncbi.nlm.nih.gov/taxonomy
#>   description: NCBI Taxonomy Database
#>   id regex: .*
#> 
#> $gbif
#> <database> gbif
#>   url: http://www.gbif.org/developer/species
#>   description: GBIF Taxonomic Backbone
#>   id regex: .*
#> 
#> $bold
#> <database> bold
#>   url: http://www.boldsystems.org
#>   description: Barcode of Life
#>   id regex: .*
#> 
#> $col
#> <database> col
#>   url: http://www.catalogueoflife.org
#>   description: Catalogue of Life
#>   id regex: .*
#> 
#> $eol
#> <database> eol
#>   url: http://eol.org
#>   description: Encyclopedia of Life
#>   id regex: .*
#> 
#> $nbn
#> <database> nbn
#>   url: https://nbn.org.uk
#>   description: UK National Biodiversity Network
#>   id regex: .*
#> 
#> $tps
#> <database> tps
#>   url: http://www.tropicos.org/
#>   description: Tropicos
#>   id regex: .*
#> 
#> $itis
#> <database> itis
#>   url: http://www.itis.gov
#>   description: Integrated Taxonomic Information System
#>   id regex: .*

rank

Taxa might have defined ranks (e.g. species, family, etc.), ambiguous ranks (e.g. "unranked", "unknown"), or no rank information at all. The particular selection and format of valid ranks varies with database, so the database can be optionally defined. If no database is defined, any ranks in any order are allowed.

taxon_rank(name = "species", database = "ncbi")
#> <TaxonRank> species
#>   database: ncbi

taxon_name

The taxon name can be defined in the same way as rank.

taxon_name("Poa", database = "ncbi")
#> <TaxonName> Poa
#>   database: ncbi

taxon_id

Each database has its set of unique taxon IDs. These IDs are better than using the taxon name directly because they are guaranteed to be unique, whereas there are often duplicates of taxon names (e.g. Orestias elegans is the name of both an orchid and a fish).

taxon_id(12345, database = "ncbi")
#> <TaxonId> 12345
#>   database: ncbi

The "taxon" class

The taxon class combines the classes containing the name, rank, and ID for the taxon. There is also a place to define an authority of the taxon.

(x <- taxon(
  name = taxon_name("Poa annua"),
  rank = taxon_rank("species"),
  id = taxon_id(93036),
  authority = "Linnaeus"
))
#> <Taxon>
#>   name: Poa annua
#>   rank: species
#>   id: 93036
#>   authority: none

Instead of the name, rank, and ID classes, simple character vectors can be supplied.

(x <- taxon(
  name = "Poa annua",
  rank = "species",
  id = 93036,
  authority = "Linnaeus"
))
#> <Taxon>
#>   name: Poa annua
#>   rank: species
#>   id: 93036
#>   authority: none

The taxa class is just a list of taxon classes with some custom print methods. It is meant to store an arbitrary list of taxon.

(x <- taxon(
  name = taxon_name("Poa annua"),
  rank = taxon_rank("species"),
  id = taxon_id(93036)
))
#> <Taxon>
#>   name: Poa annua
#>   rank: species
#>   id: 93036
#>   authority: none
taxa(x, x, x)
#> <taxa> 
#>   no. taxa:  3 
#>   Poa annua / species / 93036 
#>   Poa annua / species / 93036 
#>   Poa annua / species / 93036

The "hierarchy" class

Taxonomic classifications#Classifying_organisms) are an ordered set of taxa, each at a different rank. The hierarchy class stores a list of taxon classes like taxa, but hierarchy is meant to store all of the taxa in a classification in the correct order.

x <- taxon(
  name = taxon_name("Poaceae"),
  rank = taxon_rank("family"),
  id = taxon_id(4479)
)

y <- taxon(
  name = taxon_name("Poa"),
  rank = taxon_rank("genus"),
  id = taxon_id(4544)
)

z <- taxon(
  name = taxon_name("Poa annua"),
  rank = taxon_rank("species"),
  id = taxon_id(93036)
)

(hier1 <- hierarchy(z, y, x))
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544 
#>   Poa annua / species / 93036

Multiple hierarchy classes are stored in the hierarchies class, similar to how multiple taxon are stored in taxa.

a <- taxon(
  name = taxon_name("Felidae"),
  rank = taxon_rank("family"),
  id = taxon_id(9681)
)
b <- taxon(
  name = taxon_name("Puma"),
  rank = taxon_rank("genus"),
  id = taxon_id(146712)
)
c <- taxon(
  name = taxon_name("Puma concolor"),
  rank = taxon_rank("species"),
  id = taxon_id(9696)
)
(hier2 <- hierarchy(c, b, a))
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Felidae / family / 9681 
#>   Puma / genus / 146712 
#>   Puma concolor / species / 9696

hierarchies(hier1, hier2)
#> <Hierarchies> 
#>   no. hierarchies:  2 
#>   Poaceae / Poa / Poa annua 
#>   Felidae / Puma / Puma concolor

The "taxonomy" class

The taxonomy class stores unique taxon objects in a tree structure. Usually this kind of complex information would be the output of a file parsing function, but the code below shows how to construct a taxonomy object from scratch.

# define taxa
notoryctidae <- taxon(name = "Notoryctidae", rank = "family", id = 4479)
notoryctes <- taxon(name = "Notoryctes", rank = "genus", id = 4544)
typhlops <- taxon(name = "typhlops", rank = "species", id = 93036)
mammalia <- taxon(name = "Mammalia", rank = "class", id = 9681)
felidae <- taxon(name = "Felidae", rank = "family", id = 9681)
felis <- taxon(name = "Felis", rank = "genus", id = 9682)
catus <- taxon(name = "catus", rank = "species", id = 9685)
panthera <- taxon(name = "Panthera", rank = "genus", id = 146712)
tigris <- taxon(name = "tigris", rank = "species", id = 9696)
plantae <- taxon(name = "Plantae", rank = "kingdom", id = 33090)
solanaceae <- taxon(name = "Solanaceae", rank = "family", id = 4070)
solanum <- taxon(name = "Solanum", rank = "genus", id = 4107)
lycopersicum <- taxon(name = "lycopersicum", rank = "species", id = 49274)
tuberosum <- taxon(name = "tuberosum", rank = "species", id = 4113)
homo <- taxon(name = "homo", rank = "genus", id = 9605)
sapiens <- taxon(name = "sapiens", rank = "species", id = 9606)
hominidae <- taxon(name = "Hominidae", rank = "family", id = 9604)

# define hierarchies
tiger <- hierarchy(mammalia, felidae, panthera, tigris)
cat <- hierarchy(mammalia, felidae, felis, catus)
human <- hierarchy(mammalia, hominidae, homo, sapiens)
mole <- hierarchy(mammalia, notoryctidae, notoryctes, typhlops)
tomato <- hierarchy(plantae, solanaceae, solanum, lycopersicum)
potato <- hierarchy(plantae, solanaceae, solanum, tuberosum)

# make taxonomy
(tax <- taxonomy(tiger, cat, human, tomato, potato))
#> <Taxonomy>
#>   14 taxa: b. Mammalia, c. Plantae ... o. tuberosum
#>   14 edges: NA->b, NA->c, b->d, b->e ... i->m, j->n, j->o

Unlike the hierarchies class, each unique taxon object is only represented once in the taxonomy object. Each taxon has a corresponding entry in an edge list that encode how it is related to other taxa. This makes taxonomy more compact, but harder to manipulate using standard indexing. To make manipulation easier, there are methods for taxomomy that can provide indexes in a taxonomic context.

supertaxa

A "supertaxon" is a taxon of a coarser rank that encompasses the taxon of interest (e.g. "Homo" is a supertaxon of "sapiens"). The supertaxa function returns the supertaxa of all or a subset of the taxa in a taxonomy object.

supertaxa(tax)
#> $b
#> named integer(0)
#> 
#> $c
#> named integer(0)
#> 
#> $d
#> b 
#> 1 
#> 
#> $e
#> b 
#> 1 
#> 
#> $f
#> c 
#> 2 
#> 
#> $g
#> d b 
#> 3 1 
#> 
#> $h
#> d b 
#> 3 1 
#> 
#> $i
#> e b 
#> 4 1 
#> 
#> $j
#> f c 
#> 5 2 
#> 
#> $k
#> g d b 
#> 6 3 1 
#> 
#> $l
#> h d b 
#> 7 3 1 
#> 
#> $m
#> i e b 
#> 8 4 1 
#> 
#> $n
#> j f c 
#> 9 5 2 
#> 
#> $o
#> j f c 
#> 9 5 2

By default, the taxon IDs for the supertaxa of all taxa are returned in the same order they appear in the edge list. Taxon IDs (character) or edge list indexes (integer) can be supplied to the subset option to only return information for some taxa.

supertaxa(tax, subset = "m")
#> $m
#> i e b 
#> 8 4 1

What is returned can be modified with the value option:

supertaxa(tax, subset = "m", value = "taxon_names")
#> $m
#>           i           e           b 
#>      "homo" "Hominidae"  "Mammalia"

supertaxa(tax, subset = "m", value = "taxon_ranks")
#> $m
#>        i        e        b 
#>  "genus" "family"  "class"

You can also subset based on a logical test:

supertaxa(tax, subset = taxon_ranks == "genus", value = "taxon_names")
#> $g
#>          d          b 
#>  "Felidae" "Mammalia" 
#> 
#> $h
#>          d          b 
#>  "Felidae" "Mammalia" 
#> 
#> $i
#>           e           b 
#> "Hominidae"  "Mammalia" 
#> 
#> $j
#>            f            c 
#> "Solanaceae"    "Plantae"

The subset and value work the same for most of the following functions as well. See tax$all_names() for what can be used with value.

subtaxa

The "subtaxa" of a taxon are all those of a finer rank encompassed by that taxon. For example, sapiens is a subtaxon of Homo. The subtaxa function returns all subtaxa for each taxon in a taxonomy object.

subtaxa(tax, value = "taxon_names")
#> $b
#>           d           g           k           h           l           e 
#>   "Felidae"  "Panthera"    "tigris"     "Felis"     "catus" "Hominidae" 
#>           i           m 
#>      "homo"   "sapiens" 
#> 
#> $c
#>              f              j              n              o 
#>   "Solanaceae"      "Solanum" "lycopersicum"    "tuberosum" 
#> 
#> $d
#>          g          k          h          l 
#> "Panthera"   "tigris"    "Felis"    "catus" 
#> 
#> $e
#>         i         m 
#>    "homo" "sapiens" 
#> 
#> $f
#>              j              n              o 
#>      "Solanum" "lycopersicum"    "tuberosum" 
#> 
#> $g
#>        k 
#> "tigris" 
#> 
#> $h
#>       l 
#> "catus" 
#> 
#> $i
#>         m 
#> "sapiens" 
#> 
#> $j
#>              n              o 
#> "lycopersicum"    "tuberosum" 
#> 
#> $k
#> named character(0)
#> 
#> $l
#> named character(0)
#> 
#> $m
#> named character(0)
#> 
#> $n
#> named character(0)
#> 
#> $o
#> named character(0)

roots

We call taxa that have no supertaxa "roots". The roots function returns these taxa.

roots(tax, value = "taxon_names")
#>          b          c 
#> "Mammalia"  "Plantae"

leaves

We call taxa without any subtaxa "leaves". The leaves function returns these taxa.

leaves(tax, value = "taxon_names")
#> $b
#>         k         l         m 
#>  "tigris"   "catus" "sapiens" 
#> 
#> $c
#>              n              o 
#> "lycopersicum"    "tuberosum" 
#> 
#> $d
#>        k        l 
#> "tigris"  "catus" 
#> 
#> $e
#>         m 
#> "sapiens" 
#> 
#> $f
#>              n              o 
#> "lycopersicum"    "tuberosum" 
#> 
#> $g
#>        k 
#> "tigris" 
#> 
#> $h
#>       l 
#> "catus" 
#> 
#> $i
#>         m 
#> "sapiens" 
#> 
#> $j
#>              n              o 
#> "lycopersicum"    "tuberosum" 
#> 
#> $k
#> named character(0)
#> 
#> $l
#> named character(0)
#> 
#> $m
#> named character(0)
#> 
#> $n
#> named character(0)
#> 
#> $o
#> named character(0)

other functions

There are many other functions to interact with taxonomy object, such as stems and n_subtaxa, but these will not be described here for now.

The "taxmap" class

The taxmap class is used to store any number of tables, lists, or vectors associated with taxa. It is basically the same as the taxonomy class, but with the following additions:

  • A list called data that stores arbitrary user data associated with taxa
  • A list called funcs that stores user defined functions
info <- data.frame(name = c("tiger", "cat", "mole", "human", "tomato", "potato"),
                   n_legs = c(4, 4, 4, 2, 0, 0),
                   dangerous = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE))

phylopic_ids <- c("e148eabb-f138-43c6-b1e4-5cda2180485a",
                  "12899ba0-9923-4feb-a7f9-758c3c7d5e13",
                  "11b783d5-af1c-4f4e-8ab5-a51470652b47",
                  "9fae30cd-fb59-4a81-a39c-e1826a35f612",
                  "b6400f39-345a-4711-ab4f-92fd4e22cb1a",
                  "63604565-0406-460b-8cb8-1abe954b3f3a")

foods <- list(c("mammals", "birds"),
              c("cat food", "mice"),
              c("insects"),
              c("Most things, but especially anything rare or expensive"),
              c("light", "dirt"),
              c("light", "dirt"))

reaction <- function(x) {
  ifelse(x$data$info$dangerous,
         paste0("Watch out! That ", x$data$info$name, " might attack!"),
         paste0("No worries; its just a ", x$data$info$name, "."))
}

my_taxmap <- taxmap(tiger, cat, mole, human, tomato, potato,
                    data = list(info = info,
                                phylopic_ids = phylopic_ids,
                                foods = foods),
                    funcs = list(reaction = reaction))

In most functions that work with taxmap objects, the names of list/vector datasets, table columns, or functions can be used as if they were separate variables on their own. In the case of functions, instead of returning the function itself, the results of the functions are returned. To see what variables can be used this way, use all_names.

all_names(my_taxmap)
#>         taxon_names           taxon_ids       taxon_indexes 
#>       "taxon_names"         "taxon_ids"     "taxon_indexes" 
#>     classifications         n_supertaxa       n_supertaxa_1 
#>   "classifications"       "n_supertaxa"     "n_supertaxa_1" 
#>           n_subtaxa         n_subtaxa_1         taxon_ranks 
#>         "n_subtaxa"       "n_subtaxa_1"       "taxon_ranks" 
#>             is_root             is_stem           is_branch 
#>           "is_root"           "is_stem"         "is_branch" 
#>             is_leaf        is_internode               n_obs 
#>           "is_leaf"      "is_internode"             "n_obs" 
#>             n_obs_1      data$info$name    data$info$n_legs 
#>           "n_obs_1"              "name"            "n_legs" 
#> data$info$dangerous   data$phylopic_ids          data$foods 
#>         "dangerous"      "phylopic_ids"             "foods" 
#>      funcs$reaction 
#>          "reaction"

For example using my_taxmap$data$info$n_legs or n_legs will have the same effect inside manipulation functions like filter_taxa described below. To get the values of these variables, use get_data.

get_data(my_taxmap)
#> $taxon_names
#>              b              c              d              e              f 
#>     "Mammalia"      "Plantae"      "Felidae" "Notoryctidae"    "Hominidae" 
#>              g              h              i              j              k 
#>   "Solanaceae"     "Panthera"        "Felis"   "Notoryctes"         "homo" 
#>              l              m              n              o              p 
#>      "Solanum"       "tigris"        "catus"     "typhlops"      "sapiens" 
#>              q              r 
#> "lycopersicum"    "tuberosum" 
#> 
#> $taxon_ids
#>   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r 
#> "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" 
#> 
#> $taxon_indexes
#>  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r 
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
#> 
#> $classifications
#>                                           b 
#>                                  "Mammalia" 
#>                                           c 
#>                                   "Plantae" 
#>                                           d 
#>                          "Mammalia;Felidae" 
#>                                           e 
#>                     "Mammalia;Notoryctidae" 
#>                                           f 
#>                        "Mammalia;Hominidae" 
#>                                           g 
#>                        "Plantae;Solanaceae" 
#>                                           h 
#>                 "Mammalia;Felidae;Panthera" 
#>                                           i 
#>                    "Mammalia;Felidae;Felis" 
#>                                           j 
#>          "Mammalia;Notoryctidae;Notoryctes" 
#>                                           k 
#>                   "Mammalia;Hominidae;homo" 
#>                                           l 
#>                "Plantae;Solanaceae;Solanum" 
#>                                           m 
#>          "Mammalia;Felidae;Panthera;tigris" 
#>                                           n 
#>              "Mammalia;Felidae;Felis;catus" 
#>                                           o 
#> "Mammalia;Notoryctidae;Notoryctes;typhlops" 
#>                                           p 
#>           "Mammalia;Hominidae;homo;sapiens" 
#>                                           q 
#>   "Plantae;Solanaceae;Solanum;lycopersicum" 
#>                                           r 
#>      "Plantae;Solanaceae;Solanum;tuberosum" 
#> 
#> $n_supertaxa
#> b c d e f g h i j k l m n o p q r 
#> 0 0 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 
#> 
#> $n_supertaxa_1
#> b c d e f g h i j k l m n o p q r 
#> 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
#> 
#> $n_subtaxa
#>  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r 
#> 11  4  4  2  2  3  1  1  1  1  2  0  0  0  0  0  0 
#> 
#> $n_subtaxa_1
#> b c d e f g h i j k l m n o p q r 
#> 3 1 2 1 1 1 1 1 1 1 2 0 0 0 0 0 0 
#> 
#> $taxon_ranks
#>         b         c         d         e         f         g         h 
#>   "class" "kingdom"  "family"  "family"  "family"  "family"   "genus" 
#>         i         j         k         l         m         n         o 
#>   "genus"   "genus"   "genus"   "genus" "species" "species" "species" 
#>         p         q         r 
#> "species" "species" "species" 
#> 
#> $is_root
#>     b     c     d     e     f     g     h     i     j     k     l     m 
#>  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
#>     n     o     p     q     r 
#> FALSE FALSE FALSE FALSE FALSE 
#> 
#> $is_stem
#>     b     c     d     e     f     g     h     i     j     k     l     m 
#> FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE 
#>     n     o     p     q     r 
#> FALSE FALSE FALSE FALSE FALSE 
#> 
#> $is_branch
#>     b     c     d     e     f     g     h     i     j     k     l     m 
#> FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE 
#>     n     o     p     q     r 
#> FALSE FALSE FALSE FALSE FALSE 
#> 
#> $is_leaf
#>     b     c     d     e     f     g     h     i     j     k     l     m 
#> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 
#>     n     o     p     q     r 
#>  TRUE  TRUE  TRUE  TRUE  TRUE 
#> 
#> $is_internode
#>     b     c     d     e     f     g     h     i     j     k     l     m 
#> FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE 
#>     n     o     p     q     r 
#> FALSE FALSE FALSE FALSE FALSE 
#> 
#> $n_obs
#> b c d e f g h i j k l m n o p q r 
#> 4 2 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 
#> 
#> $n_obs_1
#> b c d e f g h i j k l m n o p q r 
#> 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 
#> 
#> $name
#>      m      n      o      p      q      r 
#>  tiger    cat   mole  human tomato potato 
#> Levels: cat human mole potato tiger tomato
#> 
#> $n_legs
#> m n o p q r 
#> 4 4 4 2 0 0 
#> 
#> $dangerous
#>     m     n     o     p     q     r 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE 
#> 
#> $phylopic_ids
#>                                      m 
#> "e148eabb-f138-43c6-b1e4-5cda2180485a" 
#>                                      n 
#> "12899ba0-9923-4feb-a7f9-758c3c7d5e13" 
#>                                      o 
#> "11b783d5-af1c-4f4e-8ab5-a51470652b47" 
#>                                      p 
#> "9fae30cd-fb59-4a81-a39c-e1826a35f612" 
#>                                      q 
#> "b6400f39-345a-4711-ab4f-92fd4e22cb1a" 
#>                                      r 
#> "63604565-0406-460b-8cb8-1abe954b3f3a" 
#> 
#> $foods
#> $foods$m
#> [1] "mammals" "birds"  
#> 
#> $foods$n
#> [1] "cat food" "mice"    
#> 
#> $foods$o
#> [1] "insects"
#> 
#> $foods$p
#> [1] "Most things, but especially anything rare or expensive"
#> 
#> $foods$q
#> [1] "light" "dirt" 
#> 
#> $foods$r
#> [1] "light" "dirt" 
#> 
#> 
#> $reaction
#> [1] "Watch out! That tiger might attack!"
#> [2] "No worries; its just a cat."        
#> [3] "No worries; its just a mole."       
#> [4] "Watch out! That human might attack!"
#> [5] "No worries; its just a tomato."     
#> [6] "No worries; its just a potato."

Note how "taxon_names" and "dangerous" are used below.

Filtering

In addition to all of the functions like subtaxa that work with taxonomy, taxmap has a set of functions to manipulate data in a taxonomic context using functions based on dplyr. Like many operations on taxmap objects, there are a pair of functions that modify the taxa as well as the associated data, which we call "observations". The filter_taxa and filter_obs functions are an example of such a pair that can filter taxa and observations respectively. For example, we can use filter_taxa to subset all taxa with a name starting with "t":

filter_taxa(my_taxmap, startsWith(taxon_names, "t"))
#> <Taxmap>
#>   3 taxa: m. tigris, o. typhlops, r. tuberosum
#>   3 edges: NA->m, NA->o, NA->r
#>   3 data sets:
#>     info:
#>     # A tibble: 3 x 4
#>       name   n_legs dangerous taxon_id
#>       <fct>   <dbl> <lgl>     <chr>   
#>     1 tiger    4.00 T         m       
#>     2 mole     4.00 F         o       
#>     3 potato   0    F         r       
#>     phylopic_ids: a named character with 3 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 3 items with names:
#>        m, o, r
#>   1 functions:
#>  reaction

There can be any number of filters that resolve to TRUE/FALSE vectors, taxon ids, or edge list indexes.

filter_taxa(my_taxmap, startsWith(taxon_names, "t"), "r")

There are many options for filter_taxa that make it very flexible. For example, the supertaxa option can make all the supertaxa of selected taxa be preserved.

filter_taxa(my_taxmap, startsWith(taxon_names, "t"), supertaxa = TRUE)
#> <Taxmap>
#>   11 taxa: m. tigris, o. typhlops ... c. Plantae
#>   11 edges: h->m, j->o, l->r, d->h ... b->e, g->l, c->g, NA->c
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 tiger   4.00 T         m       
#>     2 cat     4.00 F         d       
#>     3 mole    4.00 F         o       
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, d, o, b, l, r
#>   1 functions:
#>  reaction

The filter_obs function works in a similar way, but subsets observations in my_taxmap$data.

filter_obs(my_taxmap, "info", dangerous == TRUE)
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 2 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 tiger   4.00 T         m       
#>     2 human   2.00 T         p       
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

You can choose to filter out taxa whose observations did not pass the filter as well:

filter_obs(my_taxmap, "info", dangerous == TRUE, drop_taxa = TRUE)
#> <Taxmap>
#>   7 taxa: b. Mammalia, d. Felidae ... m. tigris, p. sapiens
#>   7 edges: NA->b, b->d, b->f, d->h, f->k, h->m, k->p
#>   3 data sets:
#>     info:
#>     # A tibble: 2 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 tiger   4.00 T         m       
#>     2 human   2.00 T         p       
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

Sampling

The functions sample_n_obs and sample_n_taxa are similar to filter_obs and filter_taxa, except taxa/observations are chosen randomly. All of the options of the "filter_" functions are available to the "sample_" functions

set.seed(1)
sample_n_taxa(my_taxmap, 3)
#> <Taxmap>
#>   3 taxa: g. Solanaceae, i. Felis, m. tigris
#>   3 edges: NA->g, NA->i, NA->m
#>   3 data sets:
#>     info:
#>     # A tibble: 4 x 4
#>       name   n_legs dangerous taxon_id
#>       <fct>   <dbl> <lgl>     <chr>   
#>     1 tiger    4.00 T         m       
#>     2 cat      4.00 F         i       
#>     3 tomato   0    F         g       
#>     # ... with 1 more row
#>     phylopic_ids: a named character with 4 items
#>        m. e148eabb-f138-43[truncated] ... g. 63604565-0406-46[truncated]
#>     foods: a list with 4 items with names:
#>        m, i, g, g
#>   1 functions:
#>  reaction
set.seed(1)
sample_n_taxa(my_taxmap, 3, supertaxa = TRUE)
#> <Taxmap>
#>   7 taxa: g. Solanaceae, i. Felis ... b. Mammalia, h. Panthera
#>   7 edges: c->g, d->i, h->m, NA->c, b->d, NA->b, d->h
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 tiger   4.00 T         m       
#>     2 cat     4.00 F         i       
#>     3 mole    4.00 F         b       
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... g. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, i, b, b, g, g
#>   1 functions:
#>  reaction

Adding columns

Adding columns to tabular datasets is done using mutate_obs.

mutate_obs(my_taxmap, "info",
           new_col = "Im new",
           newer_col = paste0(new_col, "er!"))
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 6
#>       name  n_legs dangerous taxon_id new_col newer_col
#>       <fct>  <dbl> <lgl>     <chr>    <chr>   <chr>    
#>     1 tiger   4.00 T         m        Im new  Im newer!
#>     2 cat     4.00 F         n        Im new  Im newer!
#>     3 mole    4.00 F         o        Im new  Im newer!
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

Subsetting columns

Subsetting columns in tabular datasets is done using select_obs.

# Selecting a column by name
select_obs(my_taxmap, "info", dangerous)
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 2
#>       taxon_id dangerous
#>       <chr>    <lgl>    
#>     1 m        T        
#>     2 n        F        
#>     3 o        F        
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

# Selecting a column by index
select_obs(my_taxmap, "info", 3)
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 2
#>       taxon_id dangerous
#>       <chr>    <lgl>    
#>     1 m        T        
#>     2 n        F        
#>     3 o        F        
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

# Selecting a column by regular expressions
select_obs(my_taxmap, "info", matches("^dange"))
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 2
#>       taxon_id dangerous
#>       <chr>    <lgl>    
#>     1 m        T        
#>     2 n        F        
#>     3 o        F        
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

Sorting

Sorting the edge list and observations is done using arrage_taxa and arrange_obs.

arrange_taxa(my_taxmap, taxon_names)
#> <Taxmap>
#>   17 taxa: n. catus, d. Felidae ... r. tuberosum, o. typhlops
#>   17 edges: i->n, b->d, d->i, b->f ... g->l, h->m, l->r, j->o
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 tiger   4.00 T         m       
#>     2 cat     4.00 F         n       
#>     3 mole    4.00 F         o       
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction
arrange_obs(my_taxmap, "info", name)
#> <Taxmap>
#>   17 taxa: b. Mammalia, c. Plantae ... r. tuberosum
#>   17 edges: NA->b, NA->c, b->d, b->e ... k->p, l->q, l->r
#>   3 data sets:
#>     info:
#>     # A tibble: 6 x 4
#>       name  n_legs dangerous taxon_id
#>       <fct>  <dbl> <lgl>     <chr>   
#>     1 cat     4.00 F         n       
#>     2 human   2.00 T         p       
#>     3 mole    4.00 F         o       
#>     # ... with 3 more rows
#>     phylopic_ids: a named character with 6 items
#>        m. e148eabb-f138-43[truncated] ... r. 63604565-0406-46[truncated]
#>     foods: a list with 6 items with names:
#>        m, n, o, p, q, r
#>   1 functions:
#>  reaction

Parsing data

The taxmap class has the ability to contain and manipulate very complex data. However, this can make it difficult to parse the data into a taxmap object. For this reason, there are three functions to help creating taxmap objects from nearly any kind of data that a taxonomy can be associated with or derived from. The figure below shows simplified versions of how to create taxmap objects from different types of data in different formats.

The parse_tax_data and lookup_tax_data have, in addition to the functionality above, the ability to include additional data sets that are somehow associated with the source datasets (e.g. share a common identifier). Elements in these datasets will be assigned the taxa defined in the source data, so functions like filter_taxa and filter_obs will work on all of the dataset at once.

Parsing Hierarchy and hierarchies objects

A set of functions are available for parsing objects of class Hierarchy and hierarchies. These functions are being ported from the CRAN package binomen.

The functions below are "taxonomically aware" so that you can use for example > and < operators to filter your taxonomic names data.

pick

pick() - Pick out specific taxa, while others are dropped

ex_hierarchy1
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544 
#>   Poa annua / species / 93036
# specific ranks by rank name
pick(ex_hierarchy1, ranks("family"))
#> <Hierarchy>
#>   no. taxon's:  1 
#>   Poaceae / family / 4479
# two elements by taxonomic name
pick(ex_hierarchy1, nms("Poaceae", "Poa"))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544
# two elements by taxonomic identifier
pick(ex_hierarchy1, ids(4479, 4544))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544
# combine types
pick(ex_hierarchy1, ranks("family"), ids(4544))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544

pop

pop() - Pop out taxa, that is, drop them

ex_hierarchy1
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544 
#>   Poa annua / species / 93036
# specific ranks by rank name
pop(ex_hierarchy1, ranks("family"))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poa / genus / 4544 
#>   Poa annua / species / 93036
# two elements by taxonomic name
pop(ex_hierarchy1, nms("Poaceae", "Poa"))
#> <Hierarchy>
#>   no. taxon's:  1 
#>   Poa annua / species / 93036
# two elements by taxonomic identifier
pop(ex_hierarchy1, ids(4479, 4544))
#> <Hierarchy>
#>   no. taxon's:  1 
#>   Poa annua / species / 93036
# combine types
pop(ex_hierarchy1, ranks("family"), ids(4544))
#> <Hierarchy>
#>   no. taxon's:  1 
#>   Poa annua / species / 93036

span

span() - Select a range of taxa, either by two names, or relational operators

ex_hierarchy1
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544 
#>   Poa annua / species / 93036
# keep all taxa between family and genus
# - by rank name, taxonomic name or ID
span(ex_hierarchy1, nms("Poaceae", "Poa"))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544

# keep all taxa greater than genus
span(ex_hierarchy1, ranks("> genus"))
#> <Hierarchy>
#>   no. taxon's:  1 
#>   Poaceae / family / 4479

# keep all taxa greater than or equal to genus
span(ex_hierarchy1, ranks(">= genus"))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Poaceae / family / 4479 
#>   Poa / genus / 4544

# keep all taxa less than Felidae
span(ex_hierarchy2, nms("< Felidae"))
#> <Hierarchy>
#>   no. taxon's:  2 
#>   Puma / genus / 146712 
#>   Puma concolor / species / 9696

## Multiple operator statements - useful with larger classifications
ex_hierarchy3
#> <Hierarchy>
#>   no. taxon's:  6 
#>   Chordata / phylum / 158852 
#>   Vertebrata / subphylum / 331030 
#>   Teleostei / class / 161105 
#>   Salmonidae / family / 161931 
#>   Salmo / genus / 161994 
#>   Salmo salar / species / 161996
span(ex_hierarchy3, ranks("> genus"), ranks("< phylum"))
#> <Hierarchy>
#>   no. taxon's:  3 
#>   Vertebrata / subphylum / 331030 
#>   Teleostei / class / 161105 
#>   Salmonidae / family / 161931

For more information

This vignettte is meant to be just an outline of what taxa can do. In the future, we plan to release additional, in-depth vignettes for specific topics. More informaiton for specific functions and examples can be found on their man pages by typing the name of the function prefixed by a ? in an R session. For example, ?filter_taxa will pull up the help page for filter_taxa.

Use cases

  • use in binomen:
    • if this pkg does classes, binomen can focus on verbs, e.g., manipulating taxonomic classes, doing split-apply-combine type things
  • use in taxize:
    • as we don't want to break things, probably ideal to have coercion fxns, e.g., as.taxon(), which will convert e.g., the output of get_uid() to a taxa taxonomic class, which we can then go downstream and do things with (i.e., whatever we build on top of the classes)
    • Or we could even have output of get_*() functions do coercion to taxa classes on output since they are just simple S3 classes without print methods right now
  • use in metacoder: This will eventually replace the similar classes used in metacoder.

Contributors

Comments and contributions

We welcome comments, criticisms, and especially contributions! GitHub issues are the preferred way to report bugs, ask questions, or request new features. You can submit issues here:

https://github.com/ropensci/taxa/issues

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for taxa in R doing citation(package = 'taxa')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Functions in taxa

Name Description
classifications Get classifications of taxa
all_functions Get list of usable functions
all_names Return names of data in taxonomy() or taxmap()
ex_hierarchy1 An example Hierarchy object
ex_hierarchy2 An example Hierarchy object
arrange_obs Sort user data in taxmap() objects
check_for_pkg check for packages
check_taxmap_data Check dataset format
convert_base Converts decimal numbers to other bases
is_branch Test if taxa are branches
ex_taxonomy An example Taxonomy object
correct_taxon_names Look up official names from potentially misspelled names
is_internode Test if taxa are "internodes"
n_obs Count observations in taxmap()
arrange_taxa Sort the edge list of taxmap() objects
extract_tax_data Extracts taxonomy info from vectors with regex
n_obs_1 Count observation assigned in taxmap()
parse_raw_heirarchies_to_taxonomy Infer edge list from hierarchies composed of character vectors
parse_tax_data Convert one or more data sets to taxmap
print__vector Generic vector printer
count_capture_groups Count capture groups
data_used Get values of data used in expressions
everything dplyr select_helpers
ex_hierarchies An example hierarchies object
get_dataset Get a data set from a taxmap object
get_dots_or_list Get input from dots or list
can_be_used_in_taxmap Check that a unknown object can be used with taxmap
ex_hierarchy3 An example Hierarchy object
check_class_col Check for name/index in input data
print_item Print a item
ex_taxmap An example taxmap object
map_unique Run a function on unique values of a iterable
matches dplyr select_helpers
name_font Variable name formatting in print methods
names_used Get names of data used in expressions
hierarchies Make a set of many hierarchy() class objects
filtering-helpers Taxonomic filtering helpers
get_data Get data in a taxmap object by name
obs_apply Apply function to observations per taxon
ends_with dplyr select_helpers
one_of dplyr select_helpers
sample_frac_taxa Sample a proportion of taxa from taxonomy() or taxmap()
prefixed_print Print a object with a prefix
error_font Font to indicate an error
sample_n_obs Sample n observations from taxmap()
print__character Print a character
print__matrix Print a matrix
validate_regex_key_pair Check a regex-key pair
validate_regex_match Check that all match input
is_stem Test if taxa are stems
print__numeric Print a numeric
hierarchy The Hierarchy class
leaves Get leaf taxa
init_taxmap_data Convert data input for Taxamp
multi_sep_split Like strsplit, but with multiple separators
remove_redundant_names Remove redundant parts of taxon names
internodes Get "internode" taxa
mutate_obs Add columns to taxmap() objects
replace_taxon_ids Replace taxon ids
n_subtaxa Get number of subtaxa
n_subtaxa_1 Get number of subtaxa
get_data_frame Get data in a taxonomy or taxmap object by name
subtaxa Get subtaxa
get_database_name Return name of database
parse_dataset Parse options specifying datasets
map_data Create a mapping between two variables
highlight_taxon_ids Highlight taxon ID column
subtaxa_apply Apply function to subtaxa of each taxon
taxon_name Taxon name class
map_data_ Create a mapping without NSE
parse_edge_list Convert a table with an edge list to taxmap
id_classifications Get ID classifications of taxa
n_supertaxa Get number of supertaxa
taxon_names Get taxon names
as_id Convert a vector to database IDs
branches Get "branch" taxa
database_list Database list
desc_font Description formatting in print methods
print__factor Print a factor
n_supertaxa_1 Get number of supertaxa
print__integer Print an integer
punc_font Punctuation formatting in print methods
sample_n_taxa Sample n taxa from taxonomy() or taxmap()
ranks_ref Lookup-table for IDs of taxonomic ranks
print__list Print a list
leaves_apply Apply function to leaves of each taxon
select_obs Subset columns in a taxmap() object
filter_obs Filter observations with a list of conditions
length_of_thing Check length of thing
print__logical Print a logical
filter_taxa Filter taxa with a list of conditions
num_range dplyr select_helpers
pick Pick taxa
taxmap Taxmap class
obs Get data indexes associated with taxa
pop Pop taxa out
get_sort_var Get a vector from a vector/list/table to be used in mapping
%>% magrittr forward-pipe operator
supertaxa Get all supertaxa of a taxon
taxon Taxon class
is_leaf Test if taxa are leaves
print__ordered Print a ordered factor
is_root Test if taxa are roots
taxon_rank Taxon rank class
supertaxa_apply Apply function to supertaxa of each taxon
taxon_database Taxonomy database class
taxon_ranks Get taxon ranks
print__tbl_df Print a tibble
transmute_obs Replace columns in taxmap() objects
simplify List to vector of unique elements
unique_mapping Get indexes of a unique set of the input
span Span taxa
starts_with dplyr select_helpers
stems Get stem taxa
taxonomy Taxonomy class
taxonomy_table Convert taxonomy info to a table
lookup_tax_data Convert one or more data sets to taxmap
n_leaves Get number of leaves
limited_print Print a subset of a character vector
taxon_id Taxon ID class
n_leaves_1 Get number of leaves
validate_taxmap_funcs Validate funcs input for Taxamp
parse_heirarchies_to_taxonomy Infer edge list from hierarchies
parse_possibly_named_logical used to parse inputs to drop_obs and reassign_obs
print__data.frame Print a data.frame
print__default_ Print method for unsupported
print_tree Print a text tree
roots Get root taxa
progress_lapply lappy with progress bars
sample_frac_obs Sample a proportion of observations from taxmap()
taxa-package taxa
taxa A class for multiple taxon objects
taxon_ids Get taxon IDs
taxon_indexes Get taxon indexes
tid_font Taxon id formatting in print methods
to_percent Format a proportion as a printed percent
contains dplyr select_helpers
No Results!

Vignettes of taxa

Name
class_diagram.png
class_diagram.svg
parsing_guide.png
taxa-introduction.Rmd
taxa_class_ideas.dia
taxa_class_ideas.png
No Results!

Last month downloads

Details

Type Package
VignetteBuilder knitr
LazyLoad yes
LazyData yes
Encoding UTF-8
License MIT + file LICENSE
URL https://github.com/ropensci/taxa
BugReports https://github.com/ropensci/taxa/issues
RoxygenNote 6.1.1
X-schema.org-applicationCategory Taxonomy
X-schema.org-keywords taxonomy, biology, hierarchy
X-schema.org-isPartOf https://ropensci.org
NeedsCompilation no
Packaged 2018-12-21 05:10:19 UTC; zachary
Repository CRAN
Date/Publication 2019-01-02 22:40:04 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/taxa)](http://www.rdocumentation.org/packages/taxa)