getMetaGene: Compute and plot metagene profile using identified clusters

Description

Transcriptome-wide identified clusters are used to generate a metagene profile by binning gene bodies, upstream and downstream regions.

Usage

getMetaGene(clusters, txDB = NULL, upstream = 1e3, downstream = 1e3,
nBins = 40, nBinsUD = 10, minLength = 1, genome = 'hg19', tablename =
'ensGene', plot = TRUE, verbose = TRUE, ...)

Arguments

clusters

GRanges object containing individual clusters as identified by the getClusters function

txDB

TranscriptDb object obtained through a call to the makeTxDbFromUCSC function in the GenomicFeatures package. Default is NULL, namely the object will be fetched internally

upstream

An integer corresponding to the number of bases to be considered upstream the gene. Default is 1000

downstream

An integer corresponding to the number of bases to be considered downstream the gene. Default is 1000

nBins

An integer corresponding to the number of bins to be used to partition the genes. Default is 40

nBinsUD

An integer corresponding to the number of bins to be used to partion upstream and downstream regions. Defauls is 10, i.e. the bin size is 100 bases for the default extension

minLength

An integer indicating the the minimum required length of a gene in order for it to be considered. Default is 1, i.e. all genes are considered

genome

A character specifying the genome abbreviation used by UCSC. Available abbreviations are returned by a call to

ucscGenomes()[ ,
"db"]

. Default is "hg19" (human genome)

tablename

A character specifying the name of the UCSC table containing the transcript annotations to retrieve. Available table names are returned by a call to supportedUCSCtables(). Default is "ensGene", namely ensembl gene annotations

plot

Logical, if TRUE a dotchart with cluster annotations is produced

verbose

Logical, if TRUE processing steps are printed

...

Additional parameters to be passed to the plot function

Value

A numeric vector of the same length as nBins + 2 * nBinsUD containing normalized counts. If plot, the metagene profile is also depicted as a line plot.

References

Comoglio F*, Sievers C* and Paro R, wavClusteR: an R package for PAR-CLIP data analysis, submitted

Examples

Run this code

require(BSgenome.Hsapiens.UCSC.hg19)

data( model, package = "wavClusteR" ) 

filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 method = 'mrn', 
	                 cores = 1, 
	                 threshold = 2 ) 

fclusters <- filterClusters( clusters = clusters, 
		             highConfSub = highConfSub, 
        		     coverage = coverage,
			     model = model, 
			     genome = Hsapiens, 
		             refBase = 'T', 
		             minWidth = 12 )
meta <- getMetaGene( clusters = fclusters )