get_uid: Get the UID codes from NCBI for taxonomic names.

Description

Retrieve the Unique Identifier (UID) of a taxon from NCBI taxonomy browser.

Usage

get_uid(sciname, ask = TRUE, verbose = TRUE, rows = NA, modifier = NULL, rank_query = NULL, division_filter = NULL, rank_filter = NULL, ...)
as.uid(x, check = TRUE)
"as.uid"(x, check = TRUE)
"as.uid"(x, check = TRUE)
"as.uid"(x, check = TRUE)
"as.uid"(x, check = TRUE)
"as.uid"(x, check = TRUE)
"as.data.frame"(x, ...)
get_uid_(sciname, verbose = TRUE, rows = NA)

Arguments

sciname

character; scientific name.

ask

logical; should get_uid be run in interactive mode? If TRUE and more than one TSN is found for the species, the user is asked for input. If FALSE NA is returned for multiple matches.

verbose

logical; If TRUE the actual taxon queried is printed on the console.

rows

numeric; Any number from 1 to infinity. If the default NA, all rows are considered. Note that this function still only gives back a uid class object with one to many identifiers. See get_uid_ to get back all, or a subset, of the raw data that you are presented during the ask process.

modifier

(character) A modifier to the sciname given. Options include: Organism, Scientific Name, Common Name, All Names, Division, Filter, Lineage, GC, MGC, Name Tokens, Next Level, PGC, Properties, Rank, Subtree, Synonym, Text Word. These are not checked, so make sure they are entered correctly, as is.

rank_query

(character) A taxonomic rank name to modify the query sent to NCBI. See rank_ref for possible options. Though note that some data sources use atypical ranks, so inspect the data itself for options. Optional. See Querying below.

division_filter

(character) A division (aka phylum) name to filter data after retrieved from NCBI. Optional. See Filtering below.

rank_filter

(character) A taxonomic rank name to filter data after retrieved from NCBI. See rank_ref for possible options. Though note that some data sources use atypical ranks, so inspect the data itself for options. Optional. See Filtering below.

...

Ignored

Input to as.uid

check

logical; Check if ID matches any existing on the DB, only used in as.uid

Value

A vector of taxonomic identifiers as an S3 class. If a taxon is not found an NA is given. If more than one identifier is found the function asks for user input if ask = TRUE, otherwise returns NA. If ask=FALSE and rows does not equal NA, then a data.frame is given back, but not of the uid class, which you can't pass on to other functions as you normally can.Comes with the following attributes:

match (character) - the reason for NA, either 'not found', 'found' or if ask = FALSE then 'NA due to ask=FALSE')
multiple_matches (logical) - Whether multiple matches were returned by the data source. This can be TRUE, even if you get 1 name back because we try to pattern match the name to see if there's any direct matches. So sometimes this attribute is TRUE, as well as pattern_match, which then returns 1 resulting name without user prompt.
pattern_match (logical) - Whether a pattern match was made. If TRUE then multiple_matches must be TRUE, and we found a perfect match to your name, ignoring case. If FALSE

Querying

The parameter rank_query is used in the search sent to NCBI, whereas rank_filter filters data after it comes back. The parameter modifier adds modifiers to the name. For example, modifier="Organism" adds that to the name, giving e.g., Helianthus[Organism].

Filtering

The parameters division_filter and rank_filter are not used in the search to the data provider, but are used in filtering the data down to a subset that is closer to the target you want. For all these parameters, you can use regex strings since we use grep internally to match. Filtering narrows down to the set that matches your query, and removes the rest.

Beware

NCBI does funny things sometimes. E.g., if you search on Fringella morel, a slight misspelling of the genus name, and a non-existent epithet, NCBI gives back a morel fungal species. In addition, NCBI doesn't really do fuzzy searching very well, so if there is a slight mis-spelling in your names, you likely won't get what you are expecting. The lesson: clean your names before using this function. Other data sources are better about fuzzy matching.

Examples

Run this code

## Not run: 
# get_uid(c("Chironomus riparius", "Chaetopteryx"))
# get_uid(c("Chironomus riparius", "aaa vva"))
# 
# # When not found
# get_uid("howdy")
# get_uid(c("Chironomus riparius", "howdy"))
# 
# # Narrow down results to a division or rank, or both
# ## By modifying the query
# ### w/ modifiers to the name
# get_uid(sciname = "Aratinga acuticauda", modifier = "Organism")
# get_uid(sciname = "bear", modifier = "Common Name")
# 
# ### w/ rank query
# get_uid(sciname = "Pinus", rank_query = "genus")
# get_uid(sciname = "Pinus", rank_query = "subgenus")
# ### division query doesn't really work, for unknown reasons, so not available
# 
# ## By filtering the result
# ## Echinacea example
# ### Results w/o narrowing
# get_uid("Echinacea")
# ### w/ division
# get_uid(sciname = "Echinacea", division_filter = "eudicots")
# get_uid(sciname = "Echinacea", division_filter = "sea urchins")
# 
# ## Satyrium example
# ### Results w/o narrowing
# get_uid(sciname = "Satyrium")
# ### w/ division
# get_uid(sciname = "Satyrium", division_filter = "monocots")
# get_uid(sciname = "Satyrium", division_filter = "butterflies")
# 
# ## Rank example
# get_uid(sciname = "Pinus")
# get_uid(sciname = "Pinus", rank_filter = "genus")
# get_uid(sciname = "Pinus", rank_filter = "subgenus")
# 
# # Fuzzy filter on any filtering fields
# ## uses grep on the inside
# get_uid("Satyrium", division_filter = "m")
# 
# # specify rows to limit choices available
# get_uid('Dugesia') # user prompt needed
# get_uid('Dugesia', rows=1) # 2 choices, so returns only 1 row, so no choices
# get_uid('Dugesia', ask = FALSE) # returns NA for multiple matches
# 
# # Go to a website with more info on the taxon
# res <- get_uid("Chironomus riparius")
# browseURL(attr(res, "uri"))
# 
# # Convert a uid without class information to a uid class
# as.uid(get_uid("Chironomus riparius")) # already a uid, returns the same
# as.uid(get_uid(c("Chironomus riparius","Pinus contorta"))) # same
# as.uid(315567) # numeric
# as.uid(c(315567,3339,9696)) # numeric vector, length > 1
# as.uid("315567") # character
# as.uid(c("315567","3339","9696")) # character vector, length > 1
# as.uid(list("315567","3339","9696")) # list, either numeric or character
# ## dont check, much faster
# as.uid("315567", check=FALSE)
# as.uid(315567, check=FALSE)
# as.uid(c("315567","3339","9696"), check=FALSE)
# as.uid(list("315567","3339","9696"), check=FALSE)
# 
# (out <- as.uid(c(315567,3339,9696)))
# data.frame(out)
# as.uid( data.frame(out) )
# 
# # Get all data back
# get_uid_("Puma concolor")
# get_uid_("Dugesia")
# get_uid_("Dugesia", rows=2)
# get_uid_("Dugesia", rows=1:2)
# get_uid_(c("asdfadfasd","Pinus contorta"))
# 
# # use curl options
# library("httr")
# get_uid("Quercus douglasii", config=verbose())
# bb <- get_uid("Quercus douglasii", config=progress())
# ## End(Not run)

Run the code above in your browser using DataLab