findCompatibleOverlaps
(or countCompatibleOverlaps
) can be used for finding (or counting)
hits between reads and transcripts that are compatible
with the splicing of the transcript.
findCompatibleOverlaps(query, subject)
countCompatibleOverlaps(query, subject)
findCompatibleOverlaps
.An integer vector parallel to (i.e. same length as) query
for countCompatibleOverlaps
.
findCompatibleOverlaps
is a specialized version of
findOverlaps
that uses
encodeOverlaps
internally to keep only
the hits where the junctions in the aligned read are compatible
with the splicing of the annotated transcript. The topic of working with overlap encodings is covered in details
in the "OverlapEncodings" vignette located this package
(GenomicAlignments) and accessible with
vignette("OverlapEncodings")
.
findOverlaps
generic function defined
in the IRanges package.
encodeOverlaps
generic function and
OverlapEncodings class.
## Here we only show a simple example illustrating the use of
## countCompatibleOverlaps() on a very small data set. Please
## refer to the "OverlapEncodings" vignette in the GenomicAlignments
## package for a comprehensive presentation of "overlap
## encodings" and related tools/concepts (e.g. "compatible"
## overlaps, "almost compatible" overlaps etc...), and for more
## examples.
## sm_treated1.bam contains a small subset of treated1.bam, a BAM
## file containing single-end reads from the "Pasilla" experiment
## (RNA-seq, Fly, see the pasilla data package for the details)
## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
## the UCSC Genome Browser):
sm_treated1 <- system.file("extdata", "sm_treated1.bam",
package="GenomicAlignments", mustWork=TRUE)
## Load the alignments:
flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)
## Load the transcripts (IMPORTANT: Like always, the reference genome
## of the transcripts must be *exactly* the same as the reference
## genome used to align the reads):
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)
## Number of "compatible" transcripts per alignment in 'gal':
gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
mcols(gal)$ncomptx <- gal_ncomptx
table(gal_ncomptx)
mean(gal_ncomptx >= 1)
## --> 33% of the alignments in 'gal' are "compatible" with at least
## 1 transcript in 'exbytx'.
## Keep only alignments compatible with at least 1 transcript in
## 'exbytx':
compgal <- gal[gal_ncomptx >= 1]
head(compgal)
Run the code above in your browser using DataLab