Learn R Programming

casper (version 2.6.0)

procGenome: Create an annotatedGenome object that stores information about genes and transcripts

Description

procGenome processes annotations for a given transcriptome, either from a TxDb object created by GenomicFeatures package (e.g. from UCSC) or from a user-provided GRanges object (e.g. by importing a gtf file). createDenovoGenome creates a de novo annotated genome by combining UCSC annotations and observed RNA-seq data.

Usage

procGenome(genDB, genome, mc.cores=1, verbose=TRUE)
createDenovoGenome(reads, DB, minLinks=2, maxLinkDist=1e+05, maxDist=1000, minConn=2, minJunx=3, minLen=12, mc.cores=1)

Arguments

genDB
Either a TxDb object with annotations (e.g. from UCSC or a gtf file or a GRanges object as returned by import from rtracklayer package). See details.
genome
Character indicating genome version (e.g. "hg19", "dm3")
mc.cores
Number of cores to use in parallel processing (multicore package required)
verbose
Set to TRUE to print progress information
DB
annotatedGenome object, as returned by procGenome
minLinks
Minimum number of reads joining two exons to merge their corresponding genes
maxLinkDist
Maximum distance between two exons to merge their correspondin genes. A value of 0 disables this option.
maxDist
Maximum distance between two exons with reads joining them to merge their corresponding genes.
minConn
Minimum number of fragments connecting a new exon to an annotated one to add to denovo genome.
minJunx
Minimum number of junctions needed to redefine an annotated exon's end or start.
minLen
Minimum length of a junction to consider as a putative intron.
reads
Processed reads stored in a RangedData, as returned by procBam

Value

Object of class annotatedGenome.

Methods

signature(genDB = "transcriptDb")
genDB is usually obtained with a call to makeTxDbFromUCSC (package GenomicFeatures), e.g. genDB<-makeTxDbFromUCSC(genome="hg19", tablename="refGene")
signature(genDB = "GRanges")
genDB stores information about all transcripts and their respective exons. Chromosome, start, end and strand are stored as usual in GRanges objects. genDB must have a column named "type" taking the value "transcript" for rows corresponding to transcript and "exon" for rows corresponding to exons. It must also store transcript and gene ids. For instance, Cufflinks RABT module creates a gtf file with information formatted in this manner for known and de novo predicted isoforms.

Details

These functions create the annotation objects that are needed for subsequent functions. Typically these objects are created only once for a set of samples.

If interested in quantifying expression for known transcripts only, one would typically use procGenome with a TxDb from the usual Bioconductor annotations, e.g. genDB<-makeTxDbFromUCSC(genome="hg19",tablename="refGene"), or imported from a gtf file e.g. genDB<-makeTxDbFromGFF('transcripts.gft',format='gtf'). GRanges object (e.g. genDB <- import('transcripts.gtf')). Package GenomicFeatures contains more info about how to create TxDb objects. Alternatively, one can provide annotations as a GRanges object whith is returned when importing a gtf file with function import (package rtracklayer). The output from procGenome can be used in combination with wrapKnown, which quantifies expression for a set of known transcripts, or wrapDenovo, which uses Bayesian model selection methods to assess which transcripts are truly expressed. When using wrapDenovo, you should create a single annotatedGenome object that combines information from all samples (e.g. from a gtf file produced by running your favorite isoform prediction software jointly on all samples), as this increases the power to detect new exons and isoforms.

See Also

See annotatedGenome-class for a description of the class. See methods transcripts to extract exons in each transcript, getIsland to obtain the island id corresponding to a given transcript id See splitGenomeByLength for splitting an annotatedGenome according to gene length.

Examples

Run this code
## Known transcripts from Bioconductor annotations
## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## hg19DB <- procGenome(TxDb.Hsapiens.UCSC.hg19.knownGene, genome='hg19')

## Alternative using makeTxDbFromUCSC
## genDB<-makeTxDbFromUCSC(genome="hg19", tablename="refGene")
## hg19DB <- procGenome(genDB, "hg19")

## Alternative importing .gtf file
## genDB.Cuff <- import('transcripts.gtf')
## hg19DB.Cuff <- procGenome(genDB.Cuff, genome='hg19')

Run the code above in your browser using DataLab