Learn R Programming

taxize (version 0.2.0)

ncbi_search: Search for gene sequences available for a species from NCBI.

Description

Search for gene sequences available for a species from NCBI.

Usage

ncbi_search(taxa, seqrange = "1:3000", getrelated = FALSE, limit = 500,
  verbose = TRUE)

Arguments

taxa
Scientific name to search for (character).
seqrange
Sequence range, as e.g., "1:1000" (character).
getrelated
Logical, if TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, get's nothing.
limit
Number of sequences to search for and return. Max of 10,000. If you search for 6000 records, and only 5000 are found, you will of course only get 5000 back.
verbose
logical; If TRUE (default), informative messages printed.

Value

  • Data.frame of results.

Details

Removes predicted sequences so you don't have to remove them. Predicted sequences are those with accession numbers that have "XM_" or "XR_" prefixes.

See Also

ncbi_getbyid, ncbi_getbyname

Examples

Run this code
# A single species
out <- ncbi_search(taxa="Umbra limi", seqrange = "1:2000")
# get list of genes available, removing non-unique
unique(out$genesavail)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$genesavail, ignore.case=TRUE),]

# A single species without records in NCBI
out <- ncbi_search(taxa="Sequoia wellingtonia", seqrange="1:2000", getrelated=TRUE)

# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- ncbi_search(taxa=species, seqrange = "1:2000")
lapply(out2, head) # see heads of all
out2df <- ldply(out2) # make data.frame of all
unique(out2df$genesavail) # get list of genes available, removing non-unique
out2df[grep("RAG1", out2df$genesavail, ignore.case=TRUE),] # search across all

Run the code above in your browser using DataLab