NOTE: Until BioC 2.13, findMateAlignment was the power horse used
by readGAlignmentPairs for pairing the records loaded
from a BAM file containing aligned paired-end reads.
Starting with BioC 2.14, readGAlignmentPairs relies
on scanBam(BamFile(asMates=TRUE), ...) for the
pairing.
findMateAlignment(x)
makeGAlignmentPairs(x, use.names=FALSE, use.mcols=FALSE, strandMode=1)## Related low-level utilities:
getDumpedAlignments()
countDumpedAlignments()
flushDumpedAlignments()
flag,
mrnm, and mpos. Typically obtained by loading aligned
paired-end reads from a BAM file with:
param <- ScanBamParam(what=c("flag", "mrnm", "mpos"))
x <- readGAlignments(..., use.names=TRUE, param=param)?strandMode for more information.findMateAlignment: An integer vector of the same length as
x, containing only positive or NA values, where the i-th element
is interpreted as follow:
x[i].xofx[i]'s mate. For makeGAlignmentPairs: A GAlignmentPairs object where the
pairs are formed internally by calling findMateAlignment on x.
For getDumpedAlignments: NULL or a GAlignments object
containing the dumped alignments. See "Dumped alignments" subsection in
the "Details" section above for the details.
For countDumpedAlignments: The number of dumped alignments.
Nothing for flushDumpedAlignments.
findMateAlignment is the power horse used by makeGAlignmentPairs
for pairing the records loaded from a BAM file containing aligned paired-end
reads.It implements the following pairing algorithm:
findMateAlignmentwill ignore any other record. That is,
records that correspond to single-end reads, or records that
correspond to paired-end reads where one or both ends are unmapped,
are discarded.isSecondaryAlignment flag to FALSE in
ScanBamParam(). See examples in ?readGAlignmentPairs for how
to do this.
}
For example, here are 4 records (loaded in a GAlignments object) that cannot be paired with the above algorithm:
Showing the 4 records as a GAlignments object of length 4:
GAlignments with 4 alignments and 2 metadata columns:
seqnames strand cigar qwidth start end
As you can see, the aligner has aligned the same pair to the same
location twice! The only difference between the 2 aligned pairs is in
the CIGAR i.e. one end of the pair is aligned twice to the same location
with exactly the same CIGAR while the other end of the pair is aligned
twice to the same location but with slightly different CIGARs.
Now showing the corresponding flag bits: isPaired isProperPair isUnmappedQuery hasUnmappedMate isMinusStrand [1,] 1 1 0 0 0 [2,] 1 1 0 0 0 [3,] 1 1 0 0 1 [4,] 1 1 0 0 1 isMateMinusStrand isFirstMateRead isSecondMateRead isSecondaryAlignment [1,] 1 0 1 0 [2,] 1 0 1 0 [3,] 0 1 0 0 [4,] 0 1 0 0 isNotPassingQualityControls isDuplicate [1,] 0 0 [2,] 0 0 [3,] 0 0 [4,] 0 0
As you can see, rec(1) and rec(2) are second mates, rec(3) and rec(4) are both first mates. But looking at (A), (B), (C), (D), (E), (F), and (G), the pairs could be rec(1) <-> rec(3) and rec(2) <-> rec(4), or they could be rec(1) <-> rec(4) and rec(2) <-> rec(3). There is no way to disambiguate!
So findMateAlignment is just ignoring (with a warning) those alignments
with ambiguous pairing, and dumping them in a place from which they can be
retrieved later (i.e. after findMateAlignment has returned) for
further examination (see "Dumped alignments" subsection below for the details).
In other words, alignments that cannot be paired unambiguously are not paired
at all. Concretely, this means that readGAlignmentPairs is
guaranteed to return a GAlignmentPairs object
where every pair was formed in an non-ambiguous way. Note that, in practice,
this approach doesn't seem to leave aside a lot of records because ambiguous
pairing events seem pretty rare.
}
getDumpedAlignments() after findMateAlignment has returned.
Two additional utilities are provided for manipulation of the dumped
alignments: countDumpedAlignments for counting them (a fast equivalent
to length(getDumpedAlignments())), and flushDumpedAlignments to
flush "the dump environment". Note that "the dump environment" is
automatically flushed at the beginning of a call to findMateAlignment.
}
readGAlignmentsandreadGAlignmentPairs.bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
param <- ScanBamParam(what=c("flag", "mrnm", "mpos"))
x <- readGAlignments(bamfile, use.names=TRUE, param=param)
mate <- findMateAlignment(x)
head(mate)
table(is.na(mate))
galp0 <- makeGAlignmentPairs(x)
galp <- makeGAlignmentPairs(x, use.name=TRUE, use.mcols="flag")
galp
colnames(mcols(galp))
colnames(mcols(first(galp)))
colnames(mcols(last(galp)))Run the code above in your browser using DataLab