Learn R Programming

polyester (version 1.8.3)

seq_gtf: Get transcript sequences from GTF file and sequence info

Description

Given a GTF file (for transcript structure) and DNA sequences, return a DNAStringSet of transcript sequences

Usage

seq_gtf(gtf, seqs, exononly = TRUE, idfield = "transcript_id", attrsep = "; ")

Arguments

gtf
one of path to GTF file, or data frame representing a canonical GTF file.
seqs
one of path to folder containing one FASTA file (.fa extension) for each chromosome in gtf, or named DNAStringSet containing one DNAString per chromosome in gtf, representing its sequence. In the latter case, names(seqs) should contain the same entries as the seqnames (first) column of gtf.
exononly
if TRUE (as it is by default), only create transcript sequences from the features labeled exon in gtf.
idfield
in the attributes column of gtf, what is the name of the field identifying transcripts? Should be character. Default "transcript_id".
attrsep
in the attributes column of gtf, how are attributes separated? Default "; ".

Value

DNAStringSet containing transcript sequences, with names corresponding to idfield in gtf

References

http://www.ensembl.org/info/website/upload/gff.html

Examples

Run this code
library(Biostrings)
  load(url('http://biostat.jhsph.edu/~afrazee/chr22seq.rda'))
  data(gtf_dataframe)
  chr22_processed = seq_gtf(gtf_dataframe, chr22seq)

Run the code above in your browser using DataLab