filterData: Filter the positions of interest

Description

For a group of samples this function reads the coverage information for a specific chromosome directly from the BAM files. It then merges them into a DataFrame and removes the bases that do not pass the cutoff. This is a helper function for loadCoverage and preprocessCoverage.

Usage

filterData(data, cutoff = NULL, index = NULL, filter = "one",
  totalMapped = NULL, targetSize = 8e+07, ...)

Arguments

data

Either a list of Rle objects or a DataFrame with the coverage information.

cutoff

The base-pair level cutoff to use. It's behavior is controlled by filter.

index

A logical Rle with the positions of the chromosome that passed the cutoff. If NULL it is assumed that this is the first time using filterData and thus no previous index exists.

filter

Has to be either 'one' (default) or 'mean'. In the first case, at least one sample has to have coverage above cutoff. In the second case, the mean coverage has to be greater than cutoff.

totalMapped

The total number of reads mapped for each sample. Providing this data adjusts the coverage to reads in targetSize library prior to filtering.

targetSize

The target library size to adjust the coverage to. Used only when totalMapped is specified. By default, it adjusts to libraries with 80 million reads.

...

Arguments passed to other methods and/or advanced arguments.

Value

A list with up to three components.
[object Object],[object Object],[object Object]

Details

If cutoff is NULL then the data is grouped into DataFrame without applying any cutoffs. This can be useful if you want to use loadCoverage to build the coverage DataFrame without applying any cutoffs for other downstream purposes like plotting the coverage values of a given region. You can always specify the colsubset argument in preprocessCoverage to filter the data before calculating the F statistics.

Examples

Run this code

## Construct some toy data
library('IRanges')
x <- Rle(round(runif(1e4, max=10)))
y <- Rle(round(runif(1e4, max=10)))
z <- Rle(round(runif(1e4, max=10)))
DF <- DataFrame(x, y, z)

## Filter the data
filt1 <- filterData(DF, 5)
filt1

## Filter again but only using the first two samples
filt2 <- filterData(filt1$coverage[, 1:2], 5, index=filt1$position)
filt2

Run the code above in your browser using DataLab