ncbi_search: Search for gene sequences available for a species from NCBI.

Description

Search for gene sequences available for a species from NCBI.

Usage

ncbi_search(taxa, seqrange = "1:3000", getrelated = FALSE, limit = 500,
  verbose = TRUE)

Arguments

taxa

Scientific name to search for (character).

seqrange

Sequence range, as e.g., "1:1000" (character).

getrelated

Logical, if TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, get's nothing.

limit

Number of sequences to search for and return. Max of 10,000. If you search for 6000 records, and only 5000 are found, you will of course only get 5000 back.

verbose

logical; If TRUE (default), informative messages printed.

Value

Data.frame of results.

Details

Removes predicted sequences so you don't have to remove them. Predicted sequences are those with accession numbers that have "XM_" or "XR_" prefixes.

Examples

Run this code

# A single species
out <- ncbi_search(taxa="Umbra limi", seqrange = "1:2000")
# get list of genes available, removing non-unique
unique(out$genesavail)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$genesavail, ignore.case=TRUE),]

# A single species without records in NCBI
out <- ncbi_search(taxa="Sequoia wellingtonia", seqrange="1:2000", getrelated=TRUE)

# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- ncbi_search(taxa=species, seqrange = "1:2000")
lapply(out2, head) # see heads of all
out2df <- ldply(out2) # make data.frame of all
unique(out2df$genesavail) # get list of genes available, removing non-unique
out2df[grep("RAG1", out2df$genesavail, ignore.case=TRUE),] # search across all