ShortRead
package are leveraged to do this filtering.
fastqFilter(fn, fout, truncQ = 2, truncLen = 0, trimLeft = 0, maxN = 0, minQ = 0, maxEE = Inf, rm.phix = FALSE, n = 1e+06, compress = TRUE, verbose = FALSE, ...)compress=TRUE) the output fastq file is gzipped.truncQ.
The default value of 2 is a special quality score indicating the end of good quality
sequence in Illumina 1.8+.truncLen bases. Reads shorter than this are discarded.
Note that dada currently requires all sequences to be the same length.truncLen and
trimLeft are provided, filtered reads will have length truncLen-trimLeft.maxN Ns will be discarded.
Note that dada currently does not allow Ns.Inf (no EE filtering).
After truncation, reads with higher than maxEE "expected errors" will be discarded.
Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))isPhiX.1e6, one-million reads. See FastqStreamer for details.isPhiX.fastqFilter replicates most of the functionality of the fastq_filter command in usearch
(http://www.drive5.com/usearch/manual/cmd_fastq_filter.html). It adds the ability to remove
contaminating phiX sequences as part of the filtering process.
fastqPairedFiltertestFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
filtFastq <- tempfile(fileext=".fastq.gz")
fastqFilter(testFastq, filtFastq, maxN=0, maxEE=2)
fastqFilter(testFastq, filtFastq, trimLeft=10, truncLen=200, maxEE=2, verbose=TRUE)
Run the code above in your browser using DataLab