biomartr (version 0.9.0)

getProteome: Proteome Retrieval

Description

Main proteome retrieval function for an organism of interest. By specifying the scientific name of an organism of interest the corresponding fasta-file storing the proteome of the organism of interest can be downloaded and stored locally. Proteome files can be retrieved from several databases.

Usage

getProteome(db = "refseq", organism, reference = TRUE,
  release = NULL, gunzip = FALSE, path = file.path("_ncbi_downloads",
  "proteomes"))

Arguments

db

a character string specifying the database from which the genome shall be retrieved:

  • db = "refseq"

  • db = "genbank"

  • db = "ensembl"

  • db = "uniprot"

organism

there are three options to characterize an organism:

  • by scientific name: e.g. organism = "Homo sapiens"

  • by database specific accession identifier: e.g. organism = "GCF_000001405.37" (= NCBI RefSeq identifier for Homo sapiens)

  • by taxonomic identifier from NCBI Taxonomy: e.g. organism = "9606" (= taxid of Homo sapiens)

reference

a logical value indicating whether or not a genome shall be downloaded if it isn't marked in the database as either a reference genome or a representative genome.

release

the database release version of ENSEMBL (db = "ensembl"). Default is release = NULL meaning that the most recent database version is used.

gunzip

a logical value indicating whether or not files should be unzipped.

path

a character string specifying the location (a folder) in which the corresponding proteome shall be stored. Default is path = file.path("_ncbi_downloads","proteomes").

Value

File path to downloaded proteome.

Details

Internally this function loads the the overview.txt file from NCBI:

refseq: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/

genbank: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/

and creates a directory '_ncbi_downloads/proteomes' to store the proteome of interest as fasta file for future processing.

See Also

getGenome, getCDS, getGFF, getRNA, getRepeatMasker, getAssemblyStats, meta.retrieval, read_proteome

Examples

Run this code
# NOT RUN {
# download the proteome of Arabidopsis thaliana from refseq
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
file_path <- getProteome( db       = "refseq", 
             organism = "Arabidopsis thaliana", 
             path     = file.path("_ncbi_downloads","proteomes") )

Ath_proteome <- read_proteome(file_path, format = "fasta")

# download the proteome of Arabidopsis thaliana from genbank
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
file_path <- getProteome( db       = "genbank", 
             organism = "Arabidopsis thaliana", 
             path     = file.path("_ncbi_downloads","proteomes") )

Ath_proteome <- read_proteome(file_path, format = "fasta")
# }

Run the code above in your browser using DataLab