qa2: (Updated) quality assessment reports on short reads

Description

This page summarizes an updated approach to quality assessment reports in ShortRead.

Usage

## Input source for short reads
QAFastqSource(con = character(), n = 1e+06, readerBlockSize = 1e+08, flagNSequencesRange = NA_integer_, ..., html = system.file("template", "QASources.html", package="ShortRead"))
QAData(seq = ShortReadQ(), filter = logical(length(seq)), ...)
## Possible QA elements
QAFrequentSequence(useFilter = TRUE, addFilter = TRUE, n = NA_integer_, a = NA_integer_, flagK=.8, reportSequences = FALSE, ...)
QANucleotideByCycle(useFilter = TRUE, addFilter = TRUE, ...)
QANucleotideUse(useFilter = TRUE, addFilter = TRUE, ...)
QAQualityByCycle(useFilter = TRUE, addFilter = TRUE, ...)
QAQualityUse(useFilter = TRUE, addFilter = TRUE, ...)
QAReadQuality(useFilter = TRUE, addFilter = TRUE, flagK = 0.2, flagA = 30L, ...)
QASequenceUse(useFilter = TRUE, addFilter = TRUE, ...)
QAAdapterContamination(useFilter = TRUE, addFilter = TRUE, Lpattern = NA_character_, Rpattern = NA_character_, max.Lmismatch = 0.1, max.Rmismatch = 0.2, min.trim = 9L, ...)
## Order QA report elements
QACollate(src, ...)
## perform analysis
qa2(object, state, ..., verbose=FALSE)
## Outputs from qa2
QA(src, filtered, flagged, ...)
QAFiltered(useFilter = TRUE, addFilter = TRUE, ...)
QAFlagged(useFilter = TRUE, addFilter = TRUE, ...)
## Summarize results as html report
"report"(x, ..., dest = tempfile(), type = "html")
## additional methods; 'flag' is not fully implemented
flag(object, ..., verbose=FALSE)
"rbind"(..., deparse.level = 1)

Arguments

con

character(1) file location of fastq input, as used by FastqSampler.

integer(1) number of records to input, as used by FastqStreamer (QAFastqSource). integer(1) number of sequences to tag as ‘frequent’ (QAFrequentSequence).

readerBlockSize

integer(1) number of bytes to input, as used by FastqStreamer.

flagNSequencesRange

integer(2) minimum and maximum reads above which source files will be flagged as outliers.

html

character(1) location of the HTML template for summarizing this report element.

seq

ShortReadQ representation of fastq data.

filter

logical() vector with length equal to seq, indicating whether elements of seq are filtered (TRUE) or not.

useFilter, addFilter

logical(1) indicating whether the QA element should be calculating using the filtered (useFilter=TRUE) or all reads, and whether reads failing the QA element should be added to the filter used by subsequent steps (addFilter = TRUE) or not.

integer(1) count of number of sequences above which a read will be considered ‘frequent’ (QAFrequentSequence).

flagK, flagA

flagK numeric(1) between 0 and 1 indicating the fraction of frequent sequences greater than or equal to n or a above which a fastq file will be flagged (QAFrequentSequence). flagK numeric{1} between 0 and 1 and flagA integer(1) indicating that a run should be flagged when the fraction of reads with quality greater than or equal to flagA falls below threshold flagK.

reportSequences

logical(1) indicating whether frequent sequences are to be reported.

Lpattern, Rpattern, max.Lmismatch, max.Rmismatch, min.trim

Parameters influencing adapter identification, see matchPattern.

src

The source, e.g., QAFastqSource, on which the quality assessment report will be based.

object

An instance of class derived from QA on which quality metrics will be derived; for end users, this is usually the result of QACollate.

state

The data on which quality assessment will be performed; this is not usually necessary for end-users.

verbose

logical(1) indicating whether progress reports should be reported.

filtered, flagged

Primarily for internal use, instances of QAFiltered and QAFlagged.

An instance of QA on which a report is to be generated.

dest

character(1) providing the directory in which the report is to be generated.

type

character(1) indicating the type of report to be generated; only “html” is supported.

deparse.level

see rbind.

...

Additional arguments, e.g., html to specify the location of the html source to use as a template for the report.

Value

An object derived from class .QA. Values contained in this object are meant for use by report

Details

Use QACollate to specify an order in which components of a QA report are to be assembled. The first argument is the data source (e.g., QAFastqSource).

Functions related to data input include:

QAFastqSource

defines the location of fastq files to be included in the report. con is used to construct a FastqSampler instance, and records are processed using qa2,QAFastqSource-method.

QAData

is a class for representing the data during the QA report generation pass; it is primarily for internal use.

Possible elements in a QA report are:

QAFrequentSequence

identifies the most-commonly occuring sequences. One of n or a can be non-NA, and determine the number of frequent sequences reported. n specifies the number of most-frequent sequences to filter, e.g., n=10 would filter the top 10 most commonly occurring sequences; a provides a threshold frequency (count) above which reads are filtered. The sample is flagged when a fraction flagK of the reads are filtered.

reportSequences determines whether the most commonly occuring sequences, as determined by n or a, are printed in the html report.

QANucleotideByCycle

reports nucleotide frequency as a function of cycle.

QAQualityByCycle

reports average quality score as a function of cycle.

QAQualityUse

summarizes overall nucleotide qualities.

QAReadQuality

summarizes the distribution of read qualities.

QASequenceUse

summarizes the cumulative distribution of reads occurring 1, 2, ... times.

QAAdapterContamination

reports the occurrence of ‘adapter’ sequences on the left and / or right end of each read.

Examples

Run this code

dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
fls <- dir(dirPath, "fastq.gz", full=TRUE)

coll <- QACollate(QAFastqSource(fls), QAReadQuality(),
    QAAdapterContamination(), QANucleotideUse(),
    QAQualityUse(), QASequenceUse(),
    QAFrequentSequence(n=10), QANucleotideByCycle(),
    QAQualityByCycle())
x <- qa2(coll, BPPARAM=SerialParam(), verbose=TRUE)

res <- report(x)
if (interactive())
    browseURL(res)