ncbi_searcher
From traits v0.1.0
by Scott Chamberlain
Search for gene sequences available for taxa from NCBI.
Search for gene sequences available for taxa from NCBI.
Usage
ncbi_searcher(taxa = NULL, id = NULL, seqrange = "1:3000",
getrelated = FALSE, limit = 500, entrez_query = NULL,
hypothetical = FALSE, verbose = TRUE)
Arguments
- taxa
- (character) Scientific name to search for.
- id
- (
character
) Taxonomic id to search for. Not compatible with argumenttaxa
. - seqrange
- (character) Sequence range, as e.g.,
"1:1000"
. This is the range of sequence lengths to search for. So"1:1000"
means search for sequences from 1 to 1000 characters in length. - getrelated
- (logical) If
TRUE
, gets the longest sequences of a species in the same genus as the one searched for. IfFALSE
, returns nothing if no match found. - limit
- (
numeric
) Number of sequences to search for and return. Max of 10,000. If you search for 6000 records, and only 5000 are found, you will of course only get 5000 back. - entrez_query
- (
character
; length 1) An Entrez-format query to filter results with. This is useful to search for sequences with specific characteristics. The format is the same as the one used to seach genbank. (http://www.ncbi.nlm.nih.gov/books/NBK383 - hypothetical
- (
logical
; length 1) IfFALSE
, an attempt will be made to not return hypothetical or predicted sequences judging from accession number prefixs (XM and XR). This can result in less than thelimit
being returned even if - verbose
- (logical) If
TRUE
(default), informative messages printed.
Value
data.frame
of results if a single input is given. A list ofdata.frame
s if multiple inputs are given.
See Also
Examples
# A single species
out <- ncbi_searcher(taxa="Umbra limi", seqrange = "1:2000")
# Get the same species information using a taxonomy id
out <- ncbi_searcher(id = "75935", seqrange = "1:2000")
# If the taxon name is unique, using the taxon name and id are equivalent
all(ncbi_searcher(id = "75935") == ncbi_searcher(taxa="Umbra limi"))
# If the taxon name is not unique, use taxon id
# "266948" is the uid for the butterfly genus, but there is also a genus of orchids with the
# same name
nrow(ncbi_searcher(id = "266948")) == nrow(ncbi_searcher(taxa="Satyrium"))
# get list of genes available, removing non-unique
unique(out$gene_desc)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$gene_desc, ignore.case=TRUE),]
# A single species without records in NCBI
out <- ncbi_searcher(taxa="Sequoia wellingtonia", seqrange="1:2000", getrelated=TRUE)
# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- ncbi_searcher(taxa=species, seqrange = "1:2000")
lapply(out2, head)
library("plyr")
out2df <- ldply(out2) # make data.frame of all
unique(out2df$gene_desc) # get list of genes available, removing non-unique
out2df[grep("12S", out2df$gene_desc, ignore.case=TRUE), ]
# Using the getrelated and entrez_query options
ncbi_searcher(taxa = "Olpidiopsidales", limit = 5, getrelated = TRUE,
entrez_query = "18S[title] AND 28S[title]")
Community examples
Looks like there are no examples yet.