dada2 (version 1.0.3)

derepFastq: Read in and dereplicate a fastq file.

Description

A custom interface to FastqStreamer for dereplicating amplicon sequences from fastq or compressed fastq files, while also controlling peak memory requirement to support large files.

Usage

derepFastq(fls, n = 1e+06, verbose = FALSE)

Arguments

fls
(Required). character. The file path(s) to the fastq or fastq.gz file(s). Actually, any file format supported by FastqStreamer.
n
(Optional). numeric(1). The maximum number of records (reads) to parse and dereplicate at any one time. This controls the peak memory requirement so that large fastq files are supported. Default is 1e6, one-million reads. See FastqStreamer for details on this parameter, which is passed on.
verbose
(Optional). Default FALSE. If TRUE, throw standard R messages on the intermittent and final status of the dereplication.

Value

A derep-class object or list of such objects.

Examples

Run this code
# Test that chunk-size, `n`, does not affect the result.
testFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
derep1 = derepFastq(testFastq, verbose = TRUE)
derep1.35 = derepFastq(testFastq, 35, TRUE)
all.equal(getUniques(derep1), getUniques(derep1.35)[names(getUniques(derep1))])

Run the code above in your browser using DataLab