# Introduction to the taxa package

#### subtaxa

The "subtaxa" of a taxon are all those of a finer rank encompassed by that taxon. For example, sapiens is a subtaxon of Homo. The subtaxa function returns all subtaxa for each taxon in a taxonomy object.

subtaxa(tax, value = "taxon_names")

#### roots

We call taxa that have no supertaxa "roots". The roots function returns these taxa.

roots(tax, value = "taxon_names")

#### leaves

We call taxa without any subtaxa "leaves". The leaves function returns these taxa.

leaves(tax, value = "taxon_names")

#### other functions

There are many other functions to interact with taxonomy object, such as stems and n_subtaxa, but these will not be described here for now.

### The "taxmap" class

The taxmap class is used to store any number of tables, lists, or vectors associated with taxa. It is basically the same as the taxonomy class, but with the following additions:

• A list called data that stores arbitrary user data associated with taxa
• A list called funcs that stores user defined functions
info <- data.frame(name = c("tiger", "cat", "mole", "human", "tomato", "potato"), n_legs = c(4, 4, 4, 2, 0, 0), dangerous = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)) phylopic_ids <- c("e148eabb-f138-43c6-b1e4-5cda2180485a", "12899ba0-9923-4feb-a7f9-758c3c7d5e13", "11b783d5-af1c-4f4e-8ab5-a51470652b47", "9fae30cd-fb59-4a81-a39c-e1826a35f612", "b6400f39-345a-4711-ab4f-92fd4e22cb1a", "63604565-0406-460b-8cb8-1abe954b3f3a") foods <- list(c("mammals", "birds"), c("cat food", "mice"), c("insects"), c("Most things, but especially anything rare or expensive"), c("light", "dirt"), c("light", "dirt")) reaction <- function(x) { ifelse(x$data$info$dangerous, paste0("Watch out! That ", x$data$info$name, " might attack!"), paste0("No worries; its just a ", x$data$info$name, ".")) } my_taxmap <- taxmap(tiger, cat, mole, human, tomato, potato, data = list(info = info, phylopic_ids = phylopic_ids, foods = foods), funcs = list(reaction = reaction)) In most functions that work with taxmap objects, the names of list/vector datasets, table columns, or functions can be used as if they were separate variables on their own. In the case of functions, instead of returning the function itself, the results of the functions are returned. To see what variables can be used this way, use all_names. all_names(my_taxmap) For example using my_taxmap$data$info$n_legs or n_legs will have the same effect inside manipulation functions like filter_taxa described below. To get the values of these variables, use get_data.

get_data(my_taxmap)

Note how "taxon_names" and "dangerous" are used below.

#### Filtering

In addition to all of the functions like subtaxa that work with taxonomy, taxmap has a set of functions to manipulate data in a taxonomic context using functions based on dplyr. Like many operations on taxmap objects, there are a pair of functions that modify the taxa as well as the associated data, which we call "observations". The filter_taxa and filter_obs functions are an example of such a pair that can filter taxa and observations respectively. For example, we can use filter_taxa to subset all taxa with a name starting with "t":

filter_taxa(my_taxmap, startsWith(taxon_names, "t"))

There can be any number of filters that resolve to TRUE/FALSE vectors, taxon ids, or edge list indexes.

filter_taxa(my_taxmap, startsWith(taxon_names, "t"), "r")

There are many options for filter_taxa that make it very flexible. For example, the supertaxa option can make all the supertaxa of selected taxa be preserved.

filter_taxa(my_taxmap, startsWith(taxon_names, "t"), supertaxa = TRUE)

The filter_obs function works in a similar way, but subsets observations in my_taxmap\$data.

filter_obs(my_taxmap, "info", dangerous == TRUE)

#### Sampling

The functions sample_n_obs and sample_n_taxa are similar to filter_obs and filter_taxa, except taxa/observations are chosen randomly. All of the options of the "filter" functions are available to the "sample" functions

set.seed(1) sample_n_taxa(my_taxmap, 3) set.seed(1) sample_n_taxa(my_taxmap, 3, supertaxa = TRUE)

Adding columns to tabular datasets is done using mutate_obs.

mutate_obs(my_taxmap, "info", new_col = "Im new", newer_col = paste0(new_col, "er!"))

#### Subsetting columns

Subsetting columns in tabular datasets is done using select_obs.

# Selecting a column by name select_obs(my_taxmap, "info", dangerous) # Selecting a column by index select_obs(my_taxmap, "info", 3) # Selecting a column by regular expressions select_obs(my_taxmap, "info", matches("^dange"))

#### Sorting

Sorting the edge list and observations is done using arrage_taxa and arrange_obs.

arrange_taxa(my_taxmap, taxon_names) arrange_obs(my_taxmap, "info", name)

#### Parsing data

The taxmap class has the ability to contain and manipulate very complex data. However, this can make it difficult to parse the data into a taxmap object. For this reason there are three functions to help creating taxmap objects from nearly any kind of data that a taxonomy can be associated with and derived from. The figure below shows simplified versions of how to create taxmap objects from different types of data in different formats.

fig_path <- "parsing_guide.png" width <- 718 if (knitr:::child_mode()) { # if run as a child fig_path <- file.path("vignettes", fig_path) } cat(paste0(''))

The parse_tax_data and lookup_tax_data have, in addition to the functionality above, the ability to include additional data sets that are somehow associated with the source datasets (e.g. share a common identifier). Elements in these datasets will be assigned the taxa defined in the source data, so functions like filter_taxa and filter_obs will work on all of the dataset at once.

## Parsing Hierarchy and hierarchies objects

A set of functions are available for parsing objects of class Hierarchy and hierarchies. These functions are being ported from the CRAN package binomen.

The functions below are "taxonomically aware" so that you can use for example > and < operators to filter your taxonomic names data.

### pick

pick() - Pick out specific taxa, while others are dropped

ex_hierarchy1 # specific ranks by rank name pick(ex_hierarchy1, ranks("family")) # two elements by taxonomic name pick(ex_hierarchy1, nms("Poaceae", "Poa")) # two elements by taxonomic identifier pick(ex_hierarchy1, ids(4479, 4544)) # combine types pick(ex_hierarchy1, ranks("family"), ids(4544))

### pop

pop() - Pop out taxa, that is, drop them

ex_hierarchy1 # specific ranks by rank name pop(ex_hierarchy1, ranks("family")) # two elements by taxonomic name pop(ex_hierarchy1, nms("Poaceae", "Poa")) # two elements by taxonomic identifier pop(ex_hierarchy1, ids(4479, 4544)) # combine types pop(ex_hierarchy1, ranks("family"), ids(4544))

### span

span() - Select a range of taxa, either by two names, or relational operators

ex_hierarchy1 # keep all taxa between family and genus # - by rank name, taxonomic name or ID span(ex_hierarchy1, nms("Poaceae", "Poa")) # keep all taxa greater than genus span(ex_hierarchy1, ranks("> genus")) # keep all taxa greater than or equal to genus span(ex_hierarchy1, ranks(">= genus")) # keep all taxa less than Felidae span(ex_hierarchy2, nms("< Felidae")) ## Multiple operator statements - useful with larger classifications ex_hierarchy3 span(ex_hierarchy3, ranks("> genus"), ranks("< phylum"))