findCompatibleOverlaps-methods: Finding hits between reads and transcripts that are compatible with the splicing of the transcript

Description

In the context of an RNA-seq experiment, findCompatibleOverlaps (or countCompatibleOverlaps) can be used for finding (or counting) hits between reads and transcripts that are compatible with the splicing of the transcript.

Usage

findCompatibleOverlaps(query, subject)
countCompatibleOverlaps(query, subject)

Arguments

query

A GAlignments or GAlignmentPairs object representing the aligned reads.

subject

A GRangesList object representing the transcripts.

Value

A Hits object for findCompatibleOverlaps.An integer vector parallel to (i.e. same length as) query for countCompatibleOverlaps.

Details

findCompatibleOverlaps is a specialized version of findOverlaps that uses encodeOverlaps internally to keep only the hits where the junctions in the aligned read are compatible with the splicing of the annotated transcript.

The topic of working with overlap encodings is covered in details in the "OverlapEncodings" vignette located this package (GenomicAlignments) and accessible with vignette("OverlapEncodings").

Examples

Run this code

## Here we only show a simple example illustrating the use of
## countCompatibleOverlaps() on a very small data set. Please
## refer to the "OverlapEncodings" vignette in the GenomicAlignments
## package for a comprehensive presentation of "overlap
## encodings" and related tools/concepts (e.g. "compatible"
## overlaps, "almost compatible" overlaps etc...), and for more
## examples.

## sm_treated1.bam contains a small subset of treated1.bam, a BAM
## file containing single-end reads from the "Pasilla" experiment
## (RNA-seq, Fly, see the pasilla data package for the details)
## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
## the UCSC Genome Browser):
sm_treated1 <- system.file("extdata", "sm_treated1.bam",
                           package="GenomicAlignments", mustWork=TRUE)

## Load the alignments:
flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)

## Load the transcripts (IMPORTANT: Like always, the reference genome
## of the transcripts must be *exactly* the same as the reference
## genome used to align the reads):
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)

## Number of "compatible" transcripts per alignment in 'gal':
gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
mcols(gal)$ncomptx <- gal_ncomptx
table(gal_ncomptx)
mean(gal_ncomptx >= 1)
## --> 33% of the alignments in 'gal' are "compatible" with at least
## 1 transcript in 'exbytx'.

## Keep only alignments compatible with at least 1 transcript in
## 'exbytx':
compgal <- gal[gal_ncomptx >= 1]
head(compgal)

Run the code above in your browser using DataLab