Learn R Programming

⚠️There's a newer version (0.10.0) of this package.Take me there.

taxize

taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.

The taxize tutorial is can be found at https://ropensci.org/tutorials/taxize.html.

The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification.

You need API keys for Encyclopedia of Life (EOL), and Tropicos.

SOAP

Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: World Register of Marine Species, Pan-European Species directories Infrastructure , and Mycobank, so far. Data sources that use SOAP web services have been moved to a new package called taxizesoap. Find it at https://github.com/ropensci/taxizesoap.

Currently implemented in taxize

**: There are none! We suggest using TPL and TPLck functions in the taxonstand package. We provide two functions to get bullk data: tpl_families and tpl_get.

***: There are none! The function scrapes the web directly.

May be in taxize in the future...

See the newdatasource tag in the issue tracker

Tutorial

For more examples see the tutorial

Installation

Stable version from CRAN

install.packages("taxize")

Development version from GitHub

Windows users install Rtools first.

install.packages("devtools")
devtools::install_github("ropensci/taxize")
library('taxize')

Get unique taxonomic identifier from NCBI

Alot of taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.

uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))

Retrieve classifications

Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.

out <- classification(uids)
lapply(out, head)
#> $`315576`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213
#> 
#> $`492549`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213

Immediate children

Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.

children("Salmo", db = 'ncbi')
#> $Salmo
#>    childtaxa_id                   childtaxa_name childtaxa_rank
#> 1       1509524  Salmo marmoratus x Salmo trutta        species
#> 2       1484545 Salmo cf. cenerinus BOLD:AAB3872        species
#> 3       1483130               Salmo zrmanjaensis        species
#> 4       1483129               Salmo visovacensis        species
#> 5       1483128                Salmo rhodanensis        species
#> 6       1483127                 Salmo pellegrini        species
#> 7       1483126                     Salmo opimus        species
#> 8       1483125                Salmo macedonicus        species
#> 9       1483124                Salmo lourosensis        species
#> 10      1483123                   Salmo labecula        species
#> 11      1483122                  Salmo farioides        species
#> 12      1483121                      Salmo chilo        species
#> 13      1483120                     Salmo cettii        species
#> 14      1483119                  Salmo cenerinus        species
#> 15      1483118                   Salmo aphelios        species
#> 16      1483117                    Salmo akairos        species
#> 17      1201173               Salmo peristericus        species
#> 18      1035833                   Salmo ischchan        species
#> 19       700588                     Salmo labrax        species
#> 20       237411              Salmo obtusirostris        species
#> 21       235141              Salmo platycephalus        species
#> 22       234793                    Salmo letnica        species
#> 23        62065                  Salmo ohridanus        species
#> 24        33518                 Salmo marmoratus        species
#> 25        33516                    Salmo fibreni        species
#> 26        33515                     Salmo carpio        species
#> 27         8032                     Salmo trutta        species
#> 28         8030                      Salmo salar        species
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"

Downstream children to a rank

Get all species in the genus Apis

downstream("Apis", db = 'itis', downto = 'Species', verbose = FALSE)
#> $Apis
#>      tsn parentname parenttsn          taxonname rankid rankname
#> 1 154396       Apis    154395     Apis mellifera    220  species
#> 2 763550       Apis    154395 Apis andreniformis    220  species
#> 3 763551       Apis    154395        Apis cerana    220  species
#> 4 763552       Apis    154395       Apis dorsata    220  species
#> 5 763553       Apis    154395        Apis florea    220  species
#> 6 763554       Apis    154395 Apis koschevnikovi    220  species
#> 7 763555       Apis    154395   Apis nigrocincta    220  species
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"

Upstream taxa

Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).

upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> $`Pinus contorta`
#>      tsn parentname parenttsn   taxonname rankid rankname
#> 1  18031   Pinaceae     18030       Abies    180    genus
#> 2  18033   Pinaceae     18030       Picea    180    genus
#> 3  18035   Pinaceae     18030       Pinus    180    genus
#> 4 183396   Pinaceae     18030       Tsuga    180    genus
#> 5 183405   Pinaceae     18030      Cedrus    180    genus
#> 6 183409   Pinaceae     18030       Larix    180    genus
#> 7 183418   Pinaceae     18030 Pseudotsuga    180    genus
#> 8 822529   Pinaceae     18030  Keteleeria    180    genus
#> 9 822530   Pinaceae     18030 Pseudolarix    180    genus
#> 
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"

Get synonyms

synonyms("Acer drummondii", db="itis")
#> $`Acer drummondii`
#>   sub_tsn                    acc_name acc_tsn                    syn_name
#> 1  183671 Acer rubrum var. drummondii  526853 Acer rubrum ssp. drummondii
#> 2  183671 Acer rubrum var. drummondii  526853             Acer drummondii
#> 3  183671 Acer rubrum var. drummondii  526853          Rufacer drummondii
#>   syn_tsn
#> 1   28730
#> 2  183671
#> 3  183672

Get taxonomic IDs from many sources

get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
#> $itis
#> Salvelinus fontinalis 
#>              "162003" 
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#> attr(,"class")
#> [1] "tsn"
#> 
#> $ncbi
#> Salvelinus fontinalis 
#>                "8038" 
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/8038"
#> 
#> attr(,"class")
#> [1] "ids"

You can limit to certain rows when getting ids in any get_*() functions

get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua 
#> "2704179" 
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.gbif.org/species/2704179"
#> 
#> attr(,"class")
#> [1] "ids"

Furthermore, you can just back all ids if that's your jam with the get_*_() functions (all get_*() functions with additional _ underscore at end of function name)

get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#>   ptaxonversionkey    searchmatchtitle    rank  namestatus
#> 1 NBNSYS0000027573 Chironomus riparius species Recommended
#> 2 NHMSYS0001718042   Elaphrus riparius species Recommended
#> 3 NBNSYS0000023345   Paederus riparius species Recommended
#> 
#> $nbn$`Pinus contorta`
#>   ptaxonversionkey               searchmatchtitle       rank  namestatus
#> 1 NHMSYS0000494848   Pinus contorta var. contorta    variety Recommended
#> 2 NBNSYS0000004786                 Pinus contorta    species Recommended
#> 3 NHMSYS0000494848 Pinus contorta subsp. contorta subspecies Recommended
#> 
#> 
#> attr(,"class")
#> [1] "ids"

Common names from scientific names

sci2comm('Helianthus annuus', db = 'itis')
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower"        "wild sunflower"  
#> [4] "annual sunflower"

Scientific names from common names

comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus americanus luteolus"   "Ursus americanus"           
#> [3] "Ursus americanus"            "Ursus americanus americanus"
#> [5] "Ursus thibetanus"            "Ursus thibetanus"           
#> [7] "Chiropotes satanas"

Lowest common rank among taxa

spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#>             name        rank      id
#> 21 Boreoeutheria below-class 1437010

Coerce codes to taxonomic id classes

numeric to uid

as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"

list to uid

as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339"   "9696"  
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "http://www.ncbi.nlm.nih.gov/taxonomy/3339"  
#> [3] "http://www.ncbi.nlm.nih.gov/taxonomy/9696"

Coerce taxonomic id classes to a data.frame

out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#>      ids class match multiple_matches pattern_match
#> 1 315567   uid found            FALSE         FALSE
#> 2   3339   uid found            FALSE         FALSE
#> 3   9696   uid found            FALSE         FALSE
#>                                           uri
#> 1 http://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2   http://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3   http://www.ncbi.nlm.nih.gov/taxonomy/9696

Contributors

Road map

Check out our milestones to see what we plan to get done for each version.

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for taxize in R doing citation(package = 'taxize')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('taxize')

Monthly Downloads

5,426

Version

0.7.9

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Scott Chamberlain

Last Published

July 23rd, 2016

Functions in taxize (0.7.9)

class2tree

Convert list of classifications to a tree.
apg_orders

MOBOT order names
classification

Retrieve the taxonomic hierarchy for a given taxon ID.
apg

Get APG names
apg_lookup

Lookup in the APGIII taxonomy and replace family names
col_classification

Search Catalogue of Life for taxonomic classifications.
apg_families

MOBOT family names
bold_search

Search Barcode of Life for taxonomic IDs
children

Retrieve immediate children taxa for a given taxon name or ID.
col_children

Search Catalogue of Life for for direct children of a particular taxon.
eol_dataobjects

Given the identifier for a data object, return all metadata about the object
eol_invasive

Search for presence of taxonomic names in EOL invasive species databases.
eol_search

Search for terms in EOL database.
col_downstream

Use Catalogue of Life to get downstream taxa to a given taxonomic level.
col_search

Search Catalogue of Life for taxonomic IDs
comm2sci

Get scientific names from common names.
eubon

EU BON taxonomy
eol_pages

Search for pages in EOL database using a taxonconceptID.
downstream

Retrieve the downstream taxa for a given taxon name or ID.
eol_hierarchy

Retrieve the taxonomic hierarchy from given EOL taxonID.
get_gbifid

Get the GBIF backbone taxon ID from taxonomic names.
genbank2uid

Get NCBI taxonomy UID from GenBankID
get_eolid

Get the EOL ID from Encyclopedia of Life from taxonomic names.
gbif_downstream

Retrieve all taxa names downstream in hierarchy for GBIF
fungorum

Index Fungorum
get_boldid

Get the BOLD (Barcode of Life) code for a search term.
get_colid

Get the Catalogue of Life ID from taxonomic names.
get_genes_avail

Retrieve gene sequences from NCBI by accession number.
gbif_parse

Parse taxon names using the GBIF name parser.
gbif_name_usage

Lookup details for specific names in all taxonomies in GBIF.
getacceptednamesfromtsn

Get accepted names from tsn
get_seqs

Retrieve gene sequences from NCBI by accession number.
get_tpsid

Get the NameID codes from Tropicos for taxonomic names.
get_ids

Retrieve taxonomic identifiers for a given taxon name.
get_genes

Retrieve gene sequences from NCBI by accession number.
get_tsn

Get the TSN code for a search term.
getanymatchcount

Get any match count.
get_nbnid

Get the UK National Biodiversity Network ID from taxonomic names.
get_ubioid

Get the uBio id for a search term
get_uid

Get the UID codes from NCBI for taxonomic names.
getcommentdetailfromtsn

Get comment detail from TSN
getdatedatafromtsn

Get date data from tsn
gethierarchydownfromtsn

Get hierarchy down from tsn
getcurrencyfromtsn

Get currency from tsn
getcommonnamesfromtsn

Get common names from tsn
getitistermsfromscientificname

Get itis terms from scientific names
gethierarchyupfromtsn

Get hierarchy up from tsn
getjurisdictionaloriginfromtsn

Get jurisdictional origin from tsn
getsynonymnamesfromtsn

Returns a list of the synonyms (if any) for the TSN.
gettaxonauthorshipfromtsn

Returns the author information for the TSN.
getitistermsfromcommonname

Get itis terms from common names
getitisterms

Get itis terms from common names
getcredibilityratings

Get possible credibility ratings
getcredibilityratingfromtsn

Get credibility rating from tsn
getkingdomnames

Get all possible kingdom names
getdescription

Get description of the ITIS service
getexpertsfromtsn

Get expert information for the TSN.
getlastchangedate

Provides the date the ITIS database was last updated.
getgeographicvalues

Get all possible geographic values
getglobalspeciescompletenessfromtsn

Get global species completeness from tsn
getkey

Function to get API key.
getkingdomnamefromtsn

Get kingdom names from tsn
getrecordfromlsid

Gets the partial ITIS record for the TSN in the LSID, found by comparing the TSN in the search key to the TSN field. Returns an empty result set if there is no match or the TSN is invalid.
getranknames

Provides a list of all the unique rank names contained in the database and their kingdom and rank ID values.
gettsnfromlsid

Gets the TSN corresponding to the LSID, or an empty result if there is no match.
gettsnbyvernacularlanguage

Get tsn by vernacular language
gnr_datasources

Get data sources for the Global Names Resolver.
itis_getrecord

Get full ITIS record for one or more ITIS TSN's or lsid's.
itis_downstream

Retrieve all taxa names or TSNs downstream in hierarchy from given TSN.
gnr_resolve

Resolve names using Global Names Resolver.
itis_native

Get jurisdiction data, i.e., native or not native in a region.
itis_refs

Get references related to a ITIS TSN.
ncbi_get_taxon_summary

NCBI taxon information from uids
ncbi_getbyid

Retrieve gene sequences from NCBI by accession number.
getcoremetadatafromtsn

Get core metadata from tsn
getcoveragefromtsn

Get coverge from tsn
getfullhierarchyfromtsn

Get full hierarchy from tsn
getfullrecordfromlsid

Returns the full ITIS record for the TSN in the LSID, found by comparing the TSN in the search key to the TSN field. Returns an empty result set if there is no match or the TSN is invalid.
getjurisdictionoriginvalues

Get jurisdiction origin values
getjurisdictionvalues

Get possible jurisdiction values
getparenttsnfromtsn

Returns the parent TSN for the entered TSN.
gisd_isinvasive

Check invasive species status for a set of species from GISD database
getpublicationsfromtsn

Returns a list of the pulications used for the TSN.
synonyms

Retrieve synonyms from various sources given input taxonomic names or identifiers.
gni_details

Search for taxonomic name details using the Global Names Index.
tax_agg

Aggregate species data to given taxonomic rank
tnrs_sources

TNRS sources
tp_accnames

Return all accepted names for a taxon name with a given id.
tnrs

Phylotastic Taxonomic Name Resolution Service.
tp_classification

Return all synonyms for a taxon name with a given id.
ubio_search

This function will return NameBankIDs that match given search terms
ubio_synonyms

Search uBio for taxonomic synonyms by hierarchiesID.
names_list

Get a random vector of species names.
lowest_common

Retrieve the lowest common taxon and rank for a given taxon name or ID
itis_acceptname

Retrieve accepted TSN and name
ipni_search

Search for names in the International Plant Names Index (IPNI).
resolve

Resolve names from different data sources
scrapenames

Resolve names using Global Names Recognition and Discovery.
sci2comm

Get common names from scientific names.
searchbycommonname

Search for tsn by common name
tax_name

Get taxonomic names for a given rank
tax_rank

Get rank for a given taxonomic name.
ubio_classification_search

This function will return ClassificationBankIDs (hierarchiesIDs) that refer to the given NamebankID
upstream

Retrieve the upstream taxa for a given taxon name or ID.
ubio_classification

uBio classification
vascan_search

Search the CANADENSYS Vascan API.
itis_hierarchy

ITIS hierarchy
itis_kingdomnames

Get kingdom names
iucn_status

Extractor functions for iucn-class.
iucn_summary

Get a summary from the IUCN Red List
phylomatic_format

Get family names to make Phylomatic input object, and output input string to Phylomatic for use in the function phylomatic_tree.
phylomatic_tree

Query Phylomatic for a phylogenetic tree.
searchforanymatchpaged

Search for any matched page
tp_search

Search Tropicos by scientific name, common name, or Tropicos ID.
status_codes

Get HTTP status codes
tp_summary

Return summary data a taxon name with a given id.
getfullrecordfromtsn

Get full record from TSN.
getgeographicdivisionsfromtsn

Get geographic divisions from tsn
getlsidfromtsn

Gets the unique LSID for the TSN, or an empty result if there is no match.
getothersourcesfromtsn

Returns a list of the other sources used for the TSN.
gettaxonomicranknamefromtsn

Returns the kingdom and rank information for the TSN.
gettaxonomicusagefromtsn

Returns the usage information for the TSN.
getvernacularlanguages

Provides a list of the unique languages used in the vernacular table.
getunacceptabilityreasonfromtsn

Returns the unacceptability reason, if any, for the TSN.
itis_lsid

Get kingdom names
itis_name

Get taxonomic names for a given taxonomic name query.
ion

ION - Index to Organism Names
iplant_resolve

iPlant name resolution
itis_searchcommon

Searches common name and acts as thin wrapper around searchbycommonnamebeginswith and searchbycommonnameendswith
nbn_synonyms

Return all synonyms for a taxon name with a given id from NBN
ncbi_children

Search NCBI for children of a taxon
itis_taxrank

Retrieve taxonomic rank name from given TSN.
ping

Ping an API used in taxize to see if it's working.
plantGenusNames

Vector of plant genus names from ThePlantList
searchbycommonnamebeginswith

Search for tsn by common name beginning with
searchbycommonnameendswith

Search for tsn by common name ending with
itis_terms

Get ITIS terms, i.e., tsn's, authors, common names, and scientific names.
itis-api

Low level functions for working with the ITIS API.
nbn_classification

Search UK National Biodiversity Network database for taxonomic classification
nbn_search

Search UK National Biodiversity Network database
rank_ref

Lookup-table for IDs of taxonomic ranks
rankagg

Aggregate data by given taxonomic rank
taxize-package

Taxonomic Information from Around the Web
theplantlist

Lookup-table for family, genus, and species names for ThePlantList
tp_namereferences

Return all reference records for for a taxon name with a given id.
tp_refs

Return all reference records for for a taxon name with a given id.
taxize_ldfast

Replacement function for ldply that should be faster in all cases.
taxize-defunct

Defunct functions in taxize
tp_dist

Return all distribution records for for a taxon name with a given id.
tp_namedistributions

Return all distribution records for for a taxon name with a given id.
ubio_id

Search uBio by namebank ID.
ubio_ping

uBio ping
tp_synonyms

Return all synonyms for a taxon name with a given id.
tpl_families

Get The Plant List families.
getreviewyearfromtsn

Returns the review year for the TSN.
getscientificnamefromtsn

Returns the scientific name for the TSN. Also returns the component parts (names and indicators) of the scientific name.
gni_parse

Parse scientific names using EOL's name parser.
gni_search

Search for taxonomic names using the Global Names Index.
iucn_getname

Get any matching IUCN species names
iucn_id

Get an ID for a IUCN listed taxon
ncbi_getbyname

Retrieve gene sequences from NCBI by taxon name and gene names.
plantminer

Search for taxonomy data from Plantminer.com
ncbi_search

Search for gene sequences available for taxa from NCBI.
plantNames

Vector of plant species (genus - specific epithet) names from ThePlantList
searchbyscientificname

Search by scientific name
searchforanymatch

Search for any match
taxize_cite

Get citations and licenses for data sources used in taxize
taxize_capwords

Capitalize the first letter of a character string.
tol_resolve

Resolve names using Open Tree of Life resolver
tp_acceptednames

Return all accepted names for a taxon name with a given id.
tpl_get

Get The Plant List csv files.
tpl_search

A light wrapper around the taxonstand fxn to call Theplantlist.org database.