A GFF-table is simply a data.frame
or tibble
with columns
adhering to the format specified by the GFF3 format, see
https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md for details. There is
one row for each feature.
The following columns should always be in a full gff.table
of the GFF3 format:
Seqid. A unique identifier of the genomic sequence on which the feature resides.
Source. A description of the procedure that generated the feature, e.g. "R-package micropan::findOrfs"
.
Type The type of feature, e.g. "ORF"
, "16S"
etc.
Start. The leftmost coordinate. This is the start if the feature is on the Sense strand, but
the end if it is on the Antisense strand.
End. The rightmost coordinate. This is the end if the feature is on the Sense strand, but
the start if it is on the Antisense strand.
Score. A numeric score (E-value, P-value) from the Source
.
Strand. A "+"
indicates Sense strand, a "-"
Antisense.
Phase. Only relevant for coding genes. the values 0, 1 or 2 indicates the reading frame, i.e.
the number of bases to offset the Start
in order to be in the reading frame.
Attributes. A single string with semicolon-separated tokens prociding additional information.
Missing values are described by "."
in the GFF3 format. This is also done here, except for the
numerical columns Start, End, Score and Phase. Here NA
is used, but this is replaced by
"."
when writing to file.
The readGFF
function will also read files where sequences in FASTA format are added after the GFF-table.
This file section must always start with the line ##FASTA
. This Fasta
object is added to
the GFF-table as an attribute (use attr(gff.tbl, "Fasta")
to retrieve it).