procGenome
processes annotations for a given transcriptome,
either from a TxDb
object created by GenomicFeatures
package
(e.g. from UCSC) or from a user-provided GRanges
object (e.g. by
importing a gtf file).
createDenovoGenome
creates a de novo annotated genome by
combining UCSC annotations and observed RNA-seq data.
procGenome(genDB, genome, mc.cores=1, verbose=TRUE)
createDenovoGenome(reads, DB, minLinks=2,
maxLinkDist=1e+05, maxDist=1000, minConn=2, minJunx=3, minLen=12, mc.cores=1)
TxDb
object with annotations (e.g. from
UCSC or a gtf file or a GRanges
object as returned by
import
from rtracklayer
package). See details.TRUE
to print progress informationannotatedGenome
object, as returned by
procGenome
0
disables this option.RangedData
, as
returned by procBam
annotatedGenome
.
signature(genDB = "transcriptDb")
genDB
is usually obtained with a call to
makeTxDbFromUCSC
(package GenomicFeatures
),
e.g. genDB<-makeTxDbFromUCSC(genome="hg19", tablename="refGene")
signature(genDB = "GRanges")
genDB
stores information about all transcripts and their
respective exons. Chromosome, start, end and strand are stored as
usual in GRanges objects. genDB
must have a column named
"type"
taking the value "transcript"
for rows
corresponding to transcript and "exon"
for rows corresponding
to exons. It must also store transcript and gene ids. For instance, Cufflinks RABT
module creates a gtf file with information formatted in this manner
for known and de novo predicted isoforms.
If interested in quantifying expression for known transcripts
only, one would typically use procGenome
with a
TxDb
from the usual Bioconductor annotations,
e.g. genDB<-makeTxDbFromUCSC(genome="hg19",tablename="refGene"),
or imported from a gtf file
e.g. genDB<-makeTxDbFromGFF('transcripts.gft',format='gtf').
GRanges
object (e.g. genDB <- import('transcripts.gtf')).
Package GenomicFeatures contains more info about how to create
TxDb
objects.
Alternatively, one can provide annotations as a GRanges
object
whith is returned when importing a gtf file with
function import
(package rtracklayer
).
The output from procGenome
can be used in combination with
wrapKnown
, which quantifies expression for a set of known transcripts,
or wrapDenovo
, which uses Bayesian model selection methods to
assess which transcripts are truly expressed.
When using wrapDenovo
, you should create a single annotatedGenome
object that combines information from all samples
(e.g. from a gtf file produced by running your favorite isoform
prediction software jointly on all samples),
as this increases the power to detect new exons and isoforms.
annotatedGenome-class
for a description of the class.
See methods transcripts
to extract exons in each transcript,
getIsland
to obtain the island id corresponding to a given transcript id
See splitGenomeByLength
for splitting an annotatedGenome
according to gene length.
## Known transcripts from Bioconductor annotations
## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## hg19DB <- procGenome(TxDb.Hsapiens.UCSC.hg19.knownGene, genome='hg19')
## Alternative using makeTxDbFromUCSC
## genDB<-makeTxDbFromUCSC(genome="hg19", tablename="refGene")
## hg19DB <- procGenome(genDB, "hg19")
## Alternative importing .gtf file
## genDB.Cuff <- import('transcripts.gtf')
## hg19DB.Cuff <- procGenome(genDB.Cuff, genome='hg19')
Run the code above in your browser using DataLab