taxa (version 0.1.0)

lookup_tax_data: Convert one or more data sets to taxmap

Description

Looks up taxonomic data from NCBI sequence IDs, taxon IDs, or taxon names that are present in a dataset. Also can incorporate additional associated datasets.

Usage

lookup_tax_data(tax_data, type, column = 1, datasets = list(),
  mappings = c(), database = "ncbi", include_tax_data = TRUE,
  use_database_ids = TRUE)

Arguments

tax_data

A table, list, or vector that contain sequence IDs, taxon IDs, or taxon names.

  • tables: The column option must be used to specify which column contains the sequence IDs, taxon IDs, or taxon names.

  • lists: There must be only one item per list entry unless the column option is used to specify what item to use in each list entry.

  • vectors: simply a vector of sequence IDs, taxon IDs, or taxon names.

type

("seq_id", "taxon_id", "taxon_name") What type of information can be used to look up the classifications.

column

(character or integer) The name or index of the column that contains information used to lookup classifications. This only applies when a table or list is supplied to tax_data.

datasets

Additional lists/vectors/tables that should be included in the resulting taxmap object. The mappings option is use to specify how these data sets relate to the tax_data and, by inference, what taxa apply to each item.

mappings

(named character) This defines how the taxonomic information in tax_data applies to data in datasets. This option should have the same number of inputs as datasets, with values corresponding to each dataset. The names of the character vector specify what information in tax_data is shared with info in each dataset, which is specified by the corresponding values of the character vector. If there are no shared variables, you can add NA as a placeholder, but you could just leave that data out since it is not benefiting from being in the taxmap object. The names/values can be one of the following:

  • For tables, the names of columns can be used.

  • "{{index}}" : This means to use the index of rows/items

  • "{{name}}" : This means to use row/item names.

  • "{{value}}" : This means to use the values in vectors or lists. Lists will be converted to vectors using unlist().

database

(character) The name of a database to use to look up classifications. Options include "ncbi", "itis", "eol", "col", "tropicos", and "nbn".

include_tax_data

(TRUE/FALSE) Whether or not to include tax_data as a dataset, like those in datasets.

use_database_ids

(TRUE/FALSE) Whether or not to use downloaded database taxon ids instead of arbitrary, automatically-generated taxon ids.

See Also

Other parsers: extract_tax_data, parse_tax_data

Examples

Run this code
# NOT RUN {
  # Make example data with taxonomic classifications
  species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae",
                                     "Mammalia;Carnivora;Felidae",
                                     "Mammalia;Carnivora;Ursidae"),
                             species = c("Panthera leo",
                                         "Panthera tigris",
                                         "Ursus americanus"),
                             species_id = c("A", "B", "C"))

  # Make example data associated with the taxonomic data
  # Note how this does not contain classifications, but
  # does have a varaible in common with "species_data" ("id" = "species_id")
  abundance <- data.frame(id = c("A", "B", "C", "A", "B", "C"),
                          sample_id = c(1, 1, 1, 2, 2, 2),
                          counts = c(23, 4, 3, 34, 5, 13))

  # Make another related data set named by species id
  common_names <- c(A = "Lion", B = "Tiger", C = "Bear", "Oh my!")

  # Make another related data set with no names
  foods <- list(c("ungulates", "boar"),
                c("ungulates", "boar"),
                c("salmon", "fruit", "nuts"))

  # Make a taxmap object with these three datasets
  x = lookup_tax_data(species_data,
                      type = "taxon_name",
                      datasets = list(counts = abundance,
                                      my_names = common_names,
                                      foods = foods),
                      mappings = c("species_id" = "id",
                                   "species_id" = "{{name}}",
                                   "{{index}}" = "{{index}}"),
                      column = "species")

  # Note how all the datasets have taxon ids now
  x$data

  # This allows for complex mappings between variables that other functions use
  map_data(x, my_names, foods)
  map_data(x, counts, my_names)
# }

Run the code above in your browser using DataCamp Workspace