Learn R Programming

GenomicAlignments (version 1.6.1)

pileLettersAt: Pile the letters of a set of aligned reads on top of a set of individual genomic positions

Description

pileLettersAt extracts the letters/nucleotides of a set of reads that align to a set of individual genomic positions of interest. The extracted letters are returned as "piles of letters" (one per genomic position of interest) stored in an XStringSet (typically DNAStringSet) object.

Usage

pileLettersAt(x, seqnames, pos, cigar, at)

Arguments

x
An XStringSet (typically DNAStringSet) object containing N unaligned read sequences (a.k.a. the query sequences) reported with respect to the + strand.
seqnames
A factor-Rle parallel to x. For each i, seqnames[i] must be the name of the reference sequence of the i-th alignment.
pos
An integer vector parallel to x. For each i, pos[i] must be the 1-based position on the reference sequence of the first aligned letter in x[[i]].
cigar
A character vector parallel to x. Contains the extended CIGAR strings of the alignments.
at
A GRanges object containing the individual genomic positions of interest. seqlevels(at) must be identical to levels(seqnames).

Value

An XStringSet (typically DNAStringSet) object parallel to at (i.e. with 1 string per individual genomic position).

Details

x, seqnames, pos, cigar must be 4 parallel vectors describing N aligned reads.

See Also

Examples

Run this code
## Input

##   - A BAM file:
bamfile <- BamFile(system.file("extdata", "ex1.bam", package="Rsamtools"))
seqinfo(bamfile)  # to see the seqlevels and seqlengths
stackStringsFromBam(bamfile, param="seq1:1-21")  # a quick look at
                                                 # the reads

##   - A GRanges object containing Individual Genomic Positions Of
##     Interest:
my_IGPOI <- GRanges(Rle(c("seq1", "seq2"), c(7, 2)),
                    IRanges(c(1:5, 21, 1575, 1513:1514), width=1))

## Some preliminary massage on 'my_IGPOI'

seqinfo(my_IGPOI) <- merge(seqinfo(my_IGPOI), seqinfo(bamfile))
seqlevels(my_IGPOI) <- seqlevelsInUse(my_IGPOI)

## Load the BAM file in a GAlignments object. We load only the reads
## aligned to the sequences in 'seqlevels(my_IGPOI)' and we filter out
## reads not passing quality controls (flag bit 0x200) and PCR or
## optical duplicates (flag bit 0x400). See ?ScanBamParam and the SAM
## Spec for more information. 

which <- as(seqinfo(my_IGPOI), "GRanges")
flag <- scanBamFlag(isNotPassingQualityControls=FALSE,
                    isDuplicate=FALSE)
what <- c("seq", "qual")
param <- ScanBamParam(flag=flag, what=c("seq", "qual"), which=which)
gal <- readGAlignments(bamfile, param=param)
seqlevels(gal) <- seqlevels(my_IGPOI) 

## Extract the read sequences (a.k.a. query sequences) and quality
## strings. Both are reported with respect to the + strand.

qseq <- mcols(gal)$seq
qual <- mcols(gal)$qual

nucl_piles <- pileLettersAt(qseq, seqnames(gal), start(gal), cigar(gal),
                            my_IGPOI)
qual_piles <- pileLettersAt(qual, seqnames(gal), start(gal), cigar(gal),
                            my_IGPOI)
mcols(my_IGPOI)$nucl_piles <- nucl_piles
mcols(my_IGPOI)$qual_piles <- qual_piles
my_IGPOI 

## Finally, to summarize A/C/G/T frequencies at each position:
alphabetFrequency(nucl_piles, baseOnly=TRUE)

Run the code above in your browser using DataLab