getGenome: Genome Retrieval

Description

This function retrieves a fasta-file storing the genome of an organism of interest and stores the genome file in the folder '_ncbi_downloads/genomes'.

Usage

getGenome(db = "refseq", kingdom, organism,
  path = file.path("_ncbi_downloads", "genomes"))

Arguments

a character string specifying the database from which the genome shall be retrieved: 'refseq'. Right now only the ref seq database is included. Later version of biomartr will also allow sequence retrieval from additional databases.

kingdom

a character string specifying the kingdom of the organisms of interest, e.g. "archaea","bacteria", "fungi", "invertebrate", "plant", "protozoa", "vertebrate_mammalian", or "vertebrate_other".

organism

a character string specifying the scientific name of the organism of interest, e.g. 'Arabidopsis thaliana'.

path

a character string specifying the location (a folder) in which the corresponding genome shall be stored. Default is path = file.path("_ncbi_downloads","genomes").

Value

A data.table storing the geneids in the first column and the DNA dequence in the second column.

Details

Internally this function loads the the overview.txt file from NCBI:

refseq: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/

and creates a directory '_ncbi_downloads/genomes' to store the genome of interest as fasta file for future processing. In case the corresponding fasta file already exists within the '_ncbi_downloads/genomes' folder and is accessible within the workspace, no download process will be performed.

References

ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq

http://www.ncbi.nlm.nih.gov/refseq/about/

Examples

Run this code

# download the genome of Arabidopsis thaliana from refseq
# and store the corresponding genome file in '_ncbi_downloads/genomes'
getGenome( db       = "refseq",
           kingdom  = "plant",
           organism = "Arabidopsis thaliana",
           path = file.path("_ncbi_downloads","genomes"))

file_path <- file.path("_ncbi_downloads","genomes","Arabidopsis_thaliana_genome.fna.gz")
Ath_genome <- read_genome(file_path, format = "fasta")

Run the code above in your browser using DataLab