ncbi_search: Search for gene sequences available for a species from NCBI.

Description

Search for gene sequences available for a species from NCBI.

Usage

ncbi_search(taxa, seqrange = "1:3000", getrelated = FALSE, verbose = TRUE)

Arguments

taxa

Scientific name to search for (character).

seqrange

Sequence range, as e.g., "1:1000" (character).

getrelated

Logical, if TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, get's nothing.

verbose

logical; If TRUE (default), informative messages printed.

Value

Data.frame of results.

Details

Removes predicted sequences so you don't have to remove them. Predicted sequences are those with accession numbers that have "XM_" or "XR_" prefixes.

Examples

Run this code

# A single species
out <- ncbi_search(taxa="Umbra limi", seqrange = "1:2000")
# get list of genes available, removing non-unique
unique(out$genesavail)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$genesavail, ignore.case=TRUE),]

# A single species without records in NCBI
out <- ncbi_search(taxa="Sequoia wellingtonia", seqrange="1:2000", getrelated=TRUE)

# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- ncbi_search(taxa=species, seqrange = "1:2000")
lapply(out2, head) # see heads of all
out2df <- ldply(out2) # make data.frame of all
unique(out2df$genesavail) # get list of genes available, removing non-unique
out2df[grep("RAG1", out2df$genesavail, ignore.case=TRUE),] # search across all