GenomicFeatures (version 1.18.7)

makeTranscriptDbFromGFF: Make a TxDb object from annotations available as a GFF3 or GTF file


The makeTranscriptDbFromGFF function allows the user to make a TxDb object from transcript annotations available as a GFF3 or GTF file.


makeTranscriptDbFromGFF(file, format=c("gff3","gtf"), exonRankAttributeName=NA, gffGeneIdAttributeName=NA, chrominfo=NA, dataSource=NA, species=NA, circ_seqs=DEFAULT_CIRC_SEQS, miRBaseBuild=NA, useGenesAsTranscripts=FALSE)


path/file to be processed
"gff3" or "gtf" depending on which file format you have to process
character(1) name of the attribute that defines the exon rank information, or NA to indicate that exon ranks are inferred from order of occurrence in the GFF.
an optional argument that can be used for gff style files ONLY. If the gff file lacks rows to specify gene IDs but the mRNA rows of the gff file specify the gene IDs via a named attribute,then passing the name of the attribute for this argument can allow the file to still extract gene IDs that map to these transcripts. If left blank, then the parser will try and extract rows that are named 'gene' for gene to transcript mappings when parsing a gff3 file. For gtf files this argument is ignored entirely.
data frame containing information about the chromosomes. Will be passed to the internal call to makeTranscriptDb. See ?makeTranscriptDb for the details.
Where did this data file originate? Please be as specific as possible.
What is the Genus and species of this organism. Please use proper scientific nomenclature for example: "Homo sapiens" or "Canis familiaris" and not "human" or "my fuzzy buddy". If properly written, this information may be used by the software to help you out later.
a character vector to list out which chromosomes should be marked as circular.
specify the string for the appropriate build Information from mirbase.db to use for microRNAs. This can be learned by calling supportedMiRBaseBuildValues. By default, this value will be set to NA, which will inactivate the microRNAs accessor.
This flag is normally off, but if enabled it will try to salvage a file that has no RNA features by assuming that you can use the ranges available for the Gene features in their place. Obviously, this is something you won't want to do unless you are dealing with something very simple like a prokaryote.


TxDb object.


makeTranscriptDbFromGFF is a convenience function that feeds data from the parsed file to the lower level makeTranscriptDb function.

There are some real deficiencies in the gtf and the gff3 file formats to bear in mind when making use of them. For gtf files the length of the transcripts is not normally encoded and so it has to be inferred from the exon ranges presented. That's not a horrible problem, but it bears mentioning for the sake of full disclosure. And for gff3 files the situation is typically even worse since they usually don't encode any information about the exon rank within a transcript. This is a serious oversight and so if you have an alternative to using this kind of data, you should really do so. Some files will have an attribute defined to indicate the exon rank information. For GTF files this is usually given as "exon_number", however you still must specify this argument if you don't want the code to try and infer the exon rank information. For gff3 files, we have not seen any examples of this information encoded anywhere, but if you have a file with an attribute, you can still specify this to avoid the inference.

See Also

DEFAULT_CIRC_SEQS, makeTranscriptDbFromUCSC, makeTranscriptDbFromBiomart, makeTranscriptDb, supportedMiRBaseBuildValues


Run this code
gffFile <- system.file("extdata","a.gff3",package="GenomicFeatures")
txdb <- makeTranscriptDbFromGFF(file=gffFile,
            dataSource="partial gtf file for Tomatoes for testing",
            species="Solanum lycopersicum")
if(interactive()) {

## TESTING GTF, this time specifying the chrominfo
gtfFile <- system.file("extdata","Aedes_aegypti.partial.gtf",
chrominfo <- data.frame(chrom = c('supercont1.1','supercont1.2'),
                        length=c(5220442, 5300000),
                        is_circular=c(FALSE, FALSE))
txdb2 <- makeTranscriptDbFromGFF(file=gtfFile,
             species="Aedes aegypti")
if(interactive()) {

