BamFile

Maintain and use BAM files

Use BamFile() to create a reference to a BAM file (and optionally its index). The reference remains open across calls to methods, avoiding costly index re-loading.

BamFileList() provides a convenient way of managing a list of BamFile instances.

Keywords
classes
Usage
## Constructors
BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA) BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA)
## Opening / closing
"open"(con, ...) "close"(con, ...)
## accessors; also path(), index(), yieldSize()
"isOpen"(con, rw="") "isIncomplete"(con) "obeyQname"(object, ...) obeyQname(object, ...) <- value "asMates"(object, ...) asMates(object, ...) <- value "qnamePrefixEnd"(object, ...) qnamePrefixEnd(object, ...) <- value "qnameSuffixStart"(object, ...) qnameSuffixStart(object, ...) <- value
## actions
"scanBamHeader"(files, ..., what=c("targets", "text")) "seqinfo"(x) "seqinfo"(x) "filterBam"(file, destination, index=file, ..., filter=FilterRules(), indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat())) "indexBam"(files, ...) "sortBam"(file, destination, ..., byQname=FALSE, maxMemory=512) "mergeBam"(files, destination, ...)
## reading
"scanBam"(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## counting
"countBam"(file, index=file, ..., param=ScanBamParam()) "countBam"(file, index=file, ..., param=ScanBamParam()) "quickBamFlagSummary"(file, ..., param=ScanBamParam(), main.groups.only=FALSE)
Arguments
...
Additional arguments.

For BamFileList, this can either be a single character vector of paths to BAM files, or several instances of BamFile objects. When a character vector of paths, a second named argument ‘index’ can be a character() vector of length equal to the first argument specifying the paths to the index files, or character() to indicate that no index file is available. See BamFile.

con
An instance of BamFile.
x, object, file, files
A character vector of BAM file paths (for BamFile) or a BamFile instance (for other methods).
index
character(1); the BAM index file path (for BamFile); ignored for all other methods on this page.
yieldSize
Number of records to yield each time the file is read from with scanBam. See ‘Fields’ section for details.
asMates
Logical indicating if records should be paired as mates. See ‘Fields’ section for details.
qnamePrefixEnd
Single character (or NA) marking the end of the qname prefix. When specified, all characters prior to and including the qnamePrefixEnd are removed from the qname. If the prefix is not found in the qname the qname is not trimmed. Currently only implemented for mate-pairing (i.e., when asMates=TRUE in a BamFile.
qnameSuffixStart
Single character (or NA) marking the start of the qname suffix. When specified, all characters following and including the qnameSuffixStart are removed from the qname. If the suffix is not found in the qname the qname is not trimmmed. Currently only implemented for mate-pairing (i.e., when asMates=TRUE in a BamFile.
obeyQname
Logical indicating if the BAM file is sorted by qname. In Bioconductor > 2.12 paired-end files do not need to be sorted by qname. Instead use asMates=TRUE for reading paired-end data. See ‘Fields’ section for details.
value
Logical value for setting asMates and obeyQname in a BamFile instance.
what
For scanBamHeader, a character vector specifying that either or both of c("targets", "text") are to be extracted from the header; see scanBam for additional detail.
filter
A FilterRules instance. Functions in the FilterRules instance should expect a single DataFrame argument representing all information specified by param. Each function must return a logical vector, usually of length equal to the number of rows of the DataFrame. Return values are used to include (when TRUE) corresponding records in the filtered BAM file.
destination
character(1) file path to write filtered reads to.
indexDestination
logical(1) indicating whether the destination file should also be indexed.
byQname, maxMemory
See sortBam.
param
An optional ScanBamParam instance to further influence scanning, counting, or filtering.
rw
Mode of file; ignored.
main.groups.only
See quickBamFlagSummary.

Objects from the Class

Objects are created by calls of the form BamFile().

Fields

The BamFile class inherits fields from the RsamtoolsFile class and has fields:

yieldSize:
Number of records to yield each time the file is read from using scanBam or, when length(bamWhich()) != 0, a threshold which yields records in complete ranges whose sum first exceeds yieldSize. Setting yieldSize on a BamFileList does not alter existing yield sizes set on the individual BamFile instances.
asMates:
A logical indicating if the records should be returned as mated pairs. When TRUE scanBam attempts to mate (pair) the records and returns two additional fields groupid and mate_status. groupid is an integer vector of unique group ids; mate_status is a factor with level mated for records successfully paired by the algorithm, ambiguous for records that are possibly mates but cannot be assigned unambiguously, or unmated for reads that did not have valid mates. Mate criteria:
  • Bit 0x40 and 0x80: Segments are a pair of first/last OR neither segment is marked first/last
  • Bit 0x100: Both segments are secondary OR both not secondary
  • Bit 0x10 and 0x20: Segments are on opposite strands
  • mpos match: segment1 mpos matches segment2 pos AND segment2 mpos matches segment1 pos
  • tid match
Flags, tags and ranges may be specified in the ScanBamParam for fine tuning of results.
obeyQname:
A logical(0) indicating if the file was sorted by qname. In Bioconductor > 2.12 paired-end files do not need to be sorted by qname. Instead set asMates=TRUE in the BamFile when using the readGAlignmentsList function from the GenomicAlignments package.

Functions and methods

BamFileList inherits additional methods from RsamtoolsFileList and SimpleList. Opening / closing:

open.BamFile
Opens the (local or remote) path and index (if bamIndex is not character(0)), files. Returns a BamFile instance.
close.BamFile
Closes the BamFile con; returning (invisibly) the updated BamFile. The instance may be re-opened with open.BamFile.
isOpen
Tests whether the BamFile con has been opened for reading.
isIncomplete
Tests whether the BamFile con is niether closed nor at the end of the file.
Accessors:
path
Returns a character(1) vector of BAM path names.
index
Returns a character(0) or character(1) vector of BAM index path names.
yieldSize, yieldSize<-
Return or set an integer(1) vector indicating yield size.
obeyQname, obeyQname<-
Return or set a logical(0) indicating if the file was sorted by qname.
asMates, asMates<-
Return or set a logical(0) indicating if the records should be returned as mated pairs.
Methods:
scanBamHeader
Visit the path in path(file), returning the information contained in the file header; see scanBamHeader.
seqinfo, seqnames, seqlength
Visit the path in path(file), returning a Seqinfo, character, or named integer vector containing information on the anmes and / or lengths of each sequence. Seqnames are ordered as they appear in the file.
scanBam
Visit the path in path(file), returning the result of scanBam applied to the specified path.
countBam
Visit the path(s) in path(file), returning the result of countBam applied to the specified path.
filterBam
Visit the path in path(file), returning the result of filterBam applied to the specified path. A single file can be filtered to one or several destinations, as described in filterBam.
indexBam
Visit the path in path(file), returning the result of indexBam applied to the specified path.
sortBam
Visit the path in path(file), returning the result of sortBam applied to the specified path.
mergeBam
Merge several BAM files into a single BAM file. See mergeBam for details; additional arguments supported by mergeBam,character-method are also available for BamFileList.
show
Compactly display the object.

See Also

Aliases
  • BamFile-class
  • BamFileList-class
  • BamFile
  • BamFileList
  • open.BamFile
  • close.BamFile
  • isOpen,BamFile-method
  • isIncomplete,BamFile-method
  • scanBamHeader,BamFile-method
  • seqinfo,BamFile-method
  • seqinfo,BamFileList-method
  • obeyQname
  • obeyQname<-
  • obeyQname,BamFile-method
  • obeyQname<-,BamFile-method
  • obeyQname,BamFileList-method
  • obeyQname<-,BamFileList-method
  • asMates
  • asMates<-
  • asMates,BamFile-method
  • asMates<-,BamFile-method
  • asMates,BamFileList-method
  • asMates<-,BamFileList-method
  • qnamePrefixEnd
  • qnamePrefixEnd<-
  • qnamePrefixEnd,BamFile-method
  • qnamePrefixEnd<-,BamFile-method
  • qnamePrefixEnd,BamFileList-method
  • qnamePrefixEnd<-,BamFileList-method
  • qnameSuffixStart
  • qnameSuffixStart<-
  • qnameSuffixStart,BamFile-method
  • qnameSuffixStart<-,BamFile-method
  • qnameSuffixStart,BamFileList-method
  • qnameSuffixStart<-,BamFileList-method
  • scanBam,BamFile-method
  • countBam,BamFile-method
  • countBam,BamFileList-method
  • filterBam,BamFile-method
  • indexBam,BamFile-method
  • sortBam,BamFile-method
  • mergeBam,BamFileList-method
  • quickBamFlagSummary,BamFile-method
  • show,BamFile-method
  • show,BamFileList-method
Examples

##
## BamFile options.
##

fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- BamFile(fl)
bf

## When 'asMates=TRUE' scanBam() reads the data in as
## pairs. See 'asMates' above for details of the pairing
## algorithm.
asMates(bf) <- TRUE

## When 'yieldSize' is set, scanBam() will iterate
## through the file in chunks.
yieldSize(bf) <- 500 

## Some applications append a filename (e.g., NCBI Sequence Read 
## Archive (SRA) toolkit) or allele identifier to the sequence qname.
## This may result in a unique qname for each record which presents a
## problem when mating paired-end reads (identical qnames is one
## criteria for paired-end mating). 'qnamePrefixEnd' and 
## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.
qnamePrefixEnd(bf) <- "/"
qnameSuffixStart(bf) <- "." 

##
## Reading Bam files.
##

fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
(bf <- BamFile(fl))
head(seqlengths(bf))                    # sequences and lengths in BAM file

if (require(RNAseqData.HNRNPC.bam.chr14)) {
    bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES)
    bfl
    bfl[1:2]                            # subset
    bfl[[1]]                            # select first element -- BamFile
    ## merged across BAM files
    seqinfo(bfl)
    head(seqlengths(bfl))
}


length(scanBam(fl)[[1]][[1]])  # all records

bf <- open(BamFile(fl))        # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)

## Use 'yieldSize' to iterate through a file in chunks.
bf <- open(BamFile(fl, yieldSize=1000)) 
while (nrec <- length(scanBam(bf)[[1]][[1]]))
    cat("records:", nrec, "\n")
close(bf)

## Repeatedly visit multiple ranges in the BamFile. 
rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
    param <- ScanBamParam(which=rng[i], what="seq")
    bam <- scanBam(bamFile, param=param)[[1]]
    alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)

Documentation reproduced from package Rsamtools, version 1.24.0, License: Artistic-2.0 | file LICENSE

Community examples

Looks like there are no examples yet.