readGenBank(file, text = readLines(file), partial = NA, ret.seq = TRUE,
verbose = FALSE)
text
is specified.file
partial
is NA
(the default).TRUE
. If FALSE
,
the sequence slot is set to NULL
. See NOTE.FALSE
.GenBankRecord
object containing (most,
see detaisl) of the information within file
/text
Or a list of
GenBankRecord
objects in cases where a
GBAccession
vector with more than one ID in it is passed to file
GBAccession
object is passed to file
, the
rentrez package is used to attempt to fetch full GenBank records for all
ids in the Often times, GenBank files don't contain exhaustive annotations. For example, files including CDS annotations often do not have separate transcript features. Furthermore, chromosomes are not always named, particularly in organisms that have only one. The details of how genbankr handles such cases are as follows:
In files where CDSs are annotated but individual exons are not, 'approximate exons' are defined as the individual contiguous elements within each CDS. Currently, no mixing of approximate and explicitly annotated exons is performed, even in cases where, e.g., exons are not annotated for some genes with CDS annotations.
In files where transcripts are not present, 'approximate transcripts' defined by the ranges spanned by groups of exons are used. Currently, we do not support generating approximate transcripts from CDSs in files that contain actual transcript annotations, even if those annotations do not cover all genes with CDS/exon annotations.
Features (gene, cds, variant, etc) are assumed to be contained within the
most recent previous source feature (chromosome/physical piece of DNA).
Chromosome name for source features (seqnames in the resulting
GRanges
/VRanges
is determined as follows:
gb = readGenBank(system.file("sample.gbk", package="genbankr"))
Run the code above in your browser using DataLab