get_genes_avail: Retrieve gene sequences available for a species from NCBI.

Description

This function retrieves one sequences for each species, picking the longest available for the given gene.

Usage

get_genes_avail(taxon_name, seqrange, getrelated = FALSE,
    verbose = TRUE)

Arguments

taxon_name

Scientific name to search for (character).

seqrange

Sequence range, as e.g., "1:1000" (character).

getrelated

Logical, if TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, get's nothing.

verbose

logical; If TRUE (default), informative messages printed.

Value

Data.frame of results.

Details

Removes predicted sequences so you don't have to remove them. Predicted sequences are those with accession numbers that have "XM_" or "XR_" prefixes.

Examples

Run this code

# A single species
out <- get_genes_avail(taxon_name="Umbra limi", seqrange = "1:2000",
   getrelated=F)
# get list of genes available, removing non-unique
unique(out$genesavail)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$genesavail, ignore.case=T),]

# A single species without records in NCBI
out <- get_genes_avail(taxon_name="Sequoia wellingtonia", seqrange="1:2000",
   getrelated=T)

# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- llply(species, get_genes_avail, seqrange = "1:2000", getrelated=F)
lapply(out2, head) # see heads of all
out2df <- ldply(out2) # make data.frame of all
unique(out2df$genesavail) # get list of genes available, removing non-unique
out2df[grep("RAG1", out2df$genesavail, ignore.case=T),] # search across all

Run the code above in your browser using DataLab