loadCoverage: Load the coverage information from a group of BAM files

Description

For a group of samples this function reads the coverage information for a specific chromosome directly from the BAM files. It then merges them into a DataFrame and removes the bases that do not pass the cutoff.

Usage

loadCoverage(files, chr, cutoff = NULL, filter = "one", chrlen = NULL,
  output = NULL, bai = NULL, ...)

Arguments

files

A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing files. files can also be a BamFileList, BamFile, BigWigFileList, or BigWigFile object.

chr

Chromosome to read. Should be in the format matching the one used in the raw data.

cutoff

This argument is passed to filterData.

filter

Has to be either 'one' (default) or 'mean'. In the first case, at least one sample has to have coverage above cutoff. In the second case, the mean coverage has to be greater than cutoff.

chrlen

The chromosome length in base pairs. If it's NULL, the chromosome length is extracted from the BAM files.

output

If NULL then no output is saved in disk. If auto then an automatic name is constructed using UCSC names (chrXCovInfo.Rdata for example). If another character is specified, then that name is used for #' the output file.

bai

The full path to the BAM index files. If NULL it is assumed that the BAM index files are in the same location as the BAM files and that they have the .bai extension. Ignored if files is a BamFileList object or if inputType=='BigWig'.

...

Arguments passed to other methods and/or advanced arguments.

Value

A list with two components. [object Object],[object Object]

Details

The ... argument can be used to control which alignments to consider when reading from BAM files. See scanBamFlag.

Parallelization for loading the data in chunks is used only used when tilewidth is specified. You may use up to one core per tile.

If you set the advanced argument drop.D = TRUE, bases with CIGAR string "D" (deletion from reference) will be excluded from the base-level coverage calculation.

If you are working with data from an organism different from 'Homo sapiens' specify so by setting the global 'species' and 'chrsStyle' options. For example: options(species = 'arabidopsis_thaliana') options(chrsStyle = 'NCBI')

Examples

Run this code

datadir <- system.file('extdata', 'genomeData', package='derfinder')
files <- rawFiles(datadir = datadir, samplepatt = '*accepted_hits.bam$', 
    fileterm = NULL)
## Shorten the column names
names(files) <- gsub('_accepted_hits.bam', '', names(files))
 
## Read and filter the data, only for 2 files
dataSmall <- loadCoverage(files = files[1:2], chr = '21', cutoff = 0)

## Export to BigWig files
createBw(list('chr21' = dataSmall))

## Load data from BigWig files
dataSmall.bw <- loadCoverage(c(ERR009101 = 'ERR009101.bw', ERR009102 = 
    'ERR009102.bw'), chr = 'chr21')

## Compare them
mapply(function(x, y) { x - y }, dataSmall$coverage, dataSmall.bw$coverage)

## Note that the only difference is the type of Rle (integer vs numeric) used
## to store the data.

Run the code above in your browser using DataLab