Downloads FASTA sequence files from the NCBI nr, SWISSPROT/UNIPROT,
OR RCSB PDB databases.
Usage
get.seq(ids, outfile = "seqs.fasta", db = "nr", verbose = FALSE)
Arguments
ids
A character vector of one or more appropriate database
codes/identifiers of the files to be downloaded.
outfile
A single element character vector specifying the name
of the local file to which sequences will be written.
db
A single element character vector specifying the database
from which sequences are to be obtained.
verbose
logical, if TRUE URL details of the download process
are printed.
Value
If all files are successfully downloaded a list object with two
components is returned:
ali
an alignment character matrix with a row per sequence and
a column per equivalent aminoacid/nucleotide.
ids
sequence names as identifiers.
This is similar to that returned by read.fasta. However,
if some files were not successfully downloaded then a vector detailing
which ids were not found is returned.
Details
This is a basic function to automate sequence file download from the
databases including NCBI nr, SWISSPROT/UNIPROT, and RCSB PDB.
References
Grant, B.J. et al. (2006) Bioinformatics22, 2695--2696.
# NOT RUN {## Sequence identifiers (GI or PDB codes e.g. from blast.pdb etc.)get.seq( c("P01112", "Q61411", "P20171") )
#aa <-get.seq( c("4q21", "5p21") )#aa$id#aa$ali# }