ShortRead (version 1.30.0)

qa2: (Updated) quality assessment reports on short reads

Description

This page summarizes an updated approach to quality assessment reports in ShortRead.

Usage

## Input source for short reads QAFastqSource(con = character(), n = 1e+06, readerBlockSize = 1e+08, flagNSequencesRange = NA_integer_, ..., html = system.file("template", "QASources.html", package="ShortRead")) QAData(seq = ShortReadQ(), filter = logical(length(seq)), ...)
## Possible QA elements QAFrequentSequence(useFilter = TRUE, addFilter = TRUE, n = NA_integer_, a = NA_integer_, flagK=.8, reportSequences = FALSE, ...) QANucleotideByCycle(useFilter = TRUE, addFilter = TRUE, ...) QANucleotideUse(useFilter = TRUE, addFilter = TRUE, ...) QAQualityByCycle(useFilter = TRUE, addFilter = TRUE, ...) QAQualityUse(useFilter = TRUE, addFilter = TRUE, ...) QAReadQuality(useFilter = TRUE, addFilter = TRUE, flagK = 0.2, flagA = 30L, ...) QASequenceUse(useFilter = TRUE, addFilter = TRUE, ...) QAAdapterContamination(useFilter = TRUE, addFilter = TRUE, Lpattern = NA_character_, Rpattern = NA_character_, max.Lmismatch = 0.1, max.Rmismatch = 0.2, min.trim = 9L, ...)
## Order QA report elements QACollate(src, ...)
## perform analysis qa2(object, state, ..., verbose=FALSE)
## Outputs from qa2 QA(src, filtered, flagged, ...) QAFiltered(useFilter = TRUE, addFilter = TRUE, ...) QAFlagged(useFilter = TRUE, addFilter = TRUE, ...)
## Summarize results as html report "report"(x, ..., dest = tempfile(), type = "html")
## additional methods; 'flag' is not fully implemented flag(object, ..., verbose=FALSE)
"rbind"(..., deparse.level = 1)

Arguments

con
character(1) file location of fastq input, as used by FastqSampler.
n
integer(1) number of records to input, as used by FastqStreamer (QAFastqSource). integer(1) number of sequences to tag as ‘frequent’ (QAFrequentSequence).
readerBlockSize
integer(1) number of bytes to input, as used by FastqStreamer.
flagNSequencesRange
integer(2) minimum and maximum reads above which source files will be flagged as outliers.
html
character(1) location of the HTML template for summarizing this report element.
seq
ShortReadQ representation of fastq data.
filter
logical() vector with length equal to seq, indicating whether elements of seq are filtered (TRUE) or not.
useFilter, addFilter
logical(1) indicating whether the QA element should be calculating using the filtered (useFilter=TRUE) or all reads, and whether reads failing the QA element should be added to the filter used by subsequent steps (addFilter = TRUE) or not.
a
integer(1) count of number of sequences above which a read will be considered ‘frequent’ (QAFrequentSequence).
flagK, flagA
flagK numeric(1) between 0 and 1 indicating the fraction of frequent sequences greater than or equal to n or a above which a fastq file will be flagged (QAFrequentSequence). flagK numeric{1} between 0 and 1 and flagA integer(1) indicating that a run should be flagged when the fraction of reads with quality greater than or equal to flagA falls below threshold flagK.
reportSequences
logical(1) indicating whether frequent sequences are to be reported.
Lpattern, Rpattern, max.Lmismatch, max.Rmismatch, min.trim
Parameters influencing adapter identification, see matchPattern.
src
The source, e.g., QAFastqSource, on which the quality assessment report will be based.
object
An instance of class derived from QA on which quality metrics will be derived; for end users, this is usually the result of QACollate.
state
The data on which quality assessment will be performed; this is not usually necessary for end-users.
verbose
logical(1) indicating whether progress reports should be reported.
filtered, flagged
Primarily for internal use, instances of QAFiltered and QAFlagged.
x
An instance of QA on which a report is to be generated.
dest
character(1) providing the directory in which the report is to be generated.
type
character(1) indicating the type of report to be generated; only “html” is supported.
deparse.level
see rbind.
...
Additional arguments, e.g., html to specify the location of the html source to use as a template for the report.

Value

An object derived from class .QA. Values contained in this object are meant for use by report

Details

Use QACollate to specify an order in which components of a QA report are to be assembled. The first argument is the data source (e.g., QAFastqSource).

Functions related to data input include:

QAFastqSource
defines the location of fastq files to be included in the report. con is used to construct a FastqSampler instance, and records are processed using qa2,QAFastqSource-method.

QAData
is a class for representing the data during the QA report generation pass; it is primarily for internal use.

Possible elements in a QA report are:

QAFrequentSequence
identifies the most-commonly occuring sequences. One of n or a can be non-NA, and determine the number of frequent sequences reported. n specifies the number of most-frequent sequences to filter, e.g., n=10 would filter the top 10 most commonly occurring sequences; a provides a threshold frequency (count) above which reads are filtered. The sample is flagged when a fraction flagK of the reads are filtered.

reportSequences determines whether the most commonly occuring sequences, as determined by n or a, are printed in the html report.

QANucleotideByCycle
reports nucleotide frequency as a function of cycle.

QAQualityByCycle
reports average quality score as a function of cycle.

QAQualityUse
summarizes overall nucleotide qualities.

QAReadQuality
summarizes the distribution of read qualities.

QASequenceUse
summarizes the cumulative distribution of reads occurring 1, 2, ... times.

QAAdapterContamination
reports the occurrence of ‘adapter’ sequences on the left and / or right end of each read.

See Also

QA.

Examples

Run this code
dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
fls <- dir(dirPath, "fastq.gz", full=TRUE)

coll <- QACollate(QAFastqSource(fls), QAReadQuality(),
    QAAdapterContamination(), QANucleotideUse(),
    QAQualityUse(), QASequenceUse(),
    QAFrequentSequence(n=10), QANucleotideByCycle(),
    QAQualityByCycle())
x <- qa2(coll, BPPARAM=SerialParam(), verbose=TRUE)

res <- report(x)
if (interactive())
    browseURL(res)

Run the code above in your browser using DataLab