dada2 (version 1.0.3)

derepFastq: Read in and dereplicate a fastq file.


A custom interface to FastqStreamer for dereplicating amplicon sequences from fastq or compressed fastq files, while also controlling peak memory requirement to support large files.


derepFastq(fls, n = 1e+06, verbose = FALSE)


(Required). character. The file path(s) to the fastq or fastq.gz file(s). Actually, any file format supported by FastqStreamer.
(Optional). numeric(1). The maximum number of records (reads) to parse and dereplicate at any one time. This controls the peak memory requirement so that large fastq files are supported. Default is 1e6, one-million reads. See FastqStreamer for details on this parameter, which is passed on.
(Optional). Default FALSE. If TRUE, throw standard R messages on the intermittent and final status of the dereplication.


A derep-class object or list of such objects.


Run this code
# Test that chunk-size, `n`, does not affect the result.
testFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
derep1 = derepFastq(testFastq, verbose = TRUE)
derep1.35 = derepFastq(testFastq, 35, TRUE)
all.equal(getUniques(derep1), getUniques(derep1.35)[names(getUniques(derep1))])

Run the code above in your browser using DataCamp Workspace