Character string with the filename of the gtf file. Fileformats from USCS and ENSEMBL are supported and gzip compression is supported.
chromosomes
A character vector with the chromosomes. Restricts the output to the case insensitive matching chromosomes.
refseq_nm
An option for GTF files based on RefSeq annotation. If TRUE only identifiers beginning with NM_ will be used.
gtf_feature
Defines the GTF feature types to be returned.
transcript_id
Defines name of the attribute within the attribute list which should be used as transcript IDs.
gene_id
Defines name of the attribute within the attribute list which should be used as gene IDs.
Value
GenomicRanges object with one row per exon. rownames are transcript IDs and an exon_id is provided.
Details
This function parses GTF files generated by the UCSC table browser or downloaded from the ENSEMBL ftp server. It uses only rows with a 'exon' tag in the feature column (3rd column). The transcript name will be generated from the 'transcript' entry in the attribute column (9th column). The exons of each transcript are numbered using the make.unique function on the transcript name and used as row names.