Learn R Programming

wavClusteR (version 2.6.2)

filterClusters: Merge clusters and compute all relevant cluster statistics

Description

If clusters have been identified using the mini-rank norm algorithm, cluster statistics are computed. In contrast, if the CWT-based cluster identification algorithm was used, clusters are first filtered to retain only those instances containing a wavelet peak and a high-confidence substitution site within their cluster boundaries.

Usage

filterClusters(clusters, highConfSub, coverage, model, genome,
refBase = 'T', minWidth = 12, verbose = TRUE)

Arguments

clusters
GRanges object containing individual clusters as identified by the getClusters function
highConfSub
GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function
coverage
An Rle object containing the coverage at each genomic position as returned by a call to coverage
model
List of 5 items containing the estimated mixture model as returned by the fitMixtureModel function
genome
BSgenome object of the relevant reference genome (e.g. Hsapiens for the human genome hg19)
refBase
A character specifying the base in the reference genome for which transitions are experimentally induced (e.g. 4-SU treatment - a standard in PAR-CLIP experiments - induces T to C transitions and hence refBase = "T" in this case). Default is "T"
minWidth
An integer corresponding to the minimum width of reported clusters. Shorter clusters are extended to minWidth starting from the cluster center
verbose
Logical, if TRUE processing steps are printed

Value

  • GRanges object containing the transcriptome-wide identified clusters, having metadata:
  • NtransitionsThe number of high-confidence transitions within the cluster
  • MeanCovThe mean coverage within the cluster
  • NbasesInRefThe number of genomic positions within the cluster corresponding to refBase
  • CrossLinkEffThe crosslinking efficiency within the cluster, estimated as the ratio between the number of high-confidence transitions within the cluster and the total number of genomic positions therein corresponding to refBase
  • SequenceThe genomic sequence undelying the cluster (plus strand)
  • SumLogOddsThe sum of the log-odd values within the cluster
  • RelLogOddsThe sum of the log-odds divided by the number of high-confidence transitions within the cluster. This variable can be regarded as a proxy for statistical significance and can be therefore used to rank clusters. See Comoglio, Sievers and Paro for details.

References

Herve Pages, BSgenome: Infrastructure for Biostrings-based genome data packages

Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697

Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.

See Also

getClusters, getHighConfSub, fitMixtureModel

Examples

Run this code
require(BSgenome.Hsapiens.UCSC.hg19)

data( model, package = "wavClusteR" ) 

filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 method = 'mrn', 
	                 cores = 1, 
	                 threshold = 2 ) 

fclusters <- filterClusters( clusters = clusters, 
		             highConfSub = highConfSub, 
        		     coverage = coverage,
			     model = model, 
			     genome = Hsapiens, 
		             refBase = 'T', 
		             minWidth = 12 )
fclusters

Run the code above in your browser using DataLab