filterFastq: Filter fastq from one file to another

Description

filterFastq filters reads from source to destination file(s) applying a filter to reads in each file. The filter can be a function or FilterRules instance; operations are done in a memory-efficient manner.

Usage

filterFastq(files, destinations, ..., filter = FilterRules(), compress=TRUE, yieldSize = 1000000L)

Arguments

files

a character vector of valid file paths.

destinations

a character vector of destinations, recycled to be the same length as files. destinations must not already exist.

...

Additional arguments, perhaps used by a filter function.

filter

A simple function taking as it's first argument a ShortReadQ instance and returning a modified ShortReadQ instance (e.g., with records or nucleotides removed), or a FilterRules instance specifying which records are to be removed.

compress

A logical(1) indicating whether the file should be gz-compressed. The default is TRUE.

yieldSize

Number of fastq records processed in each call to filter; increase this for (marginally) more efficient I/O at the expense of increased memory use.

Examples

Run this code

## path to a convenient fastq file
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")

## filter reads to keep those with GC < 0.7
fun <- function(x) {
    gc <- alphabetFrequency(sread(x), baseOnly=TRUE)[,c("G", "C")]
    x[rowSums(gc) / width(x) < .7]    
}
filterFastq(fl, tempfile(), filter=fun)

## trimEnds,character-method uses filterFastq internally
trimEnds(fl, "V", destinations=tempfile())

Run the code above in your browser using DataLab