getCDSSet

a character string specifying the database from which the CDS
shall be retrieved:<ul>
<li><code>db = "refseq"</code></li>
<li><code>db = "genbank"</code></li>
<li><code>db = "ensembl"</code></li>
</ul>

a character vector storing the names of the organisms than shall be retrieved.
There are three available options to characterize an organism:<ul>
<li>by <code>scientific name</code>: e.g. <code>organism = "Homo sapiens"</code></li>
<li>by <code>database specific accession identifier</code>: e.g. <code>organism = "GCF_000001405.37"</code> (= NCBI RefSeq identifier for <code>Homo sapiens</code>)</li>
<li>by <code>taxonomic identifier from NCBI Taxonomy</code>: e.g. <code>organism = "9606"</code> (= taxid of <code>Homo sapiens</code>)</li>
</ul>

organisms

a logical value indicating whether or not a CDS shall be downloaded if it isn't marked
in the database as either a reference CDS or a representative CDS.

reference

the database release version of ENSEMBL (<code>db = "ensembl"</code>). Default is <code>release = NULL</code> meaning
that the most recent database version is used.

release

logical value indicating whether or not downloaded files shall be renamed for more convenient downstream data analysis.

clean_retrieval

a logical value indicating whether or not files should be unzipped.

gunzip

a logical value indicating whether or not files that were already downloaded and are still present in the 
output folder shall be updated and re-loaded (<code>update = TRUE</code> or whether the existing file shall be retained <code>update = FALSE</code> (Default)).

update

a character string specifying the location (a folder) in which
the corresponding CDSs shall be stored. Default is
<code>path</code> = <code>"set_CDS"</code>.

path

Main CDS retrieval function for a set of organism of interest.
By specifying the scientific names of the organisms of interest the corresponding fasta-files storing the CDS of the organisms of interest
will be downloaded and stored locally. CDS files can be retrieved from several databases.

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized
way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome
retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', 'ENSEMBLGENOMES',
and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database
(Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve
functional annotation for genomic loci. In addition, users can download entire databases such
as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr',
'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. as
well as 'ENSEMBL' and 'ENSEMBLGENOMES' with only one command.

getCDSSet: CDS retrieval of multiple species

Description

Usage

Arguments

Value

Details

See Also

Examples