This page summarizes an updated approach to quality assessment reports
in ShortRead
.
## Input source for short reads
QAFastqSource(con = character(), n = 1e+06, readerBlockSize = 1e+08, flagNSequencesRange = NA_integer_, ..., html = system.file("template", "QASources.html", package="ShortRead"))
QAData(seq = ShortReadQ(), filter = logical(length(seq)), ...)
## Possible QA elements
QAFrequentSequence(useFilter = TRUE, addFilter = TRUE, n = NA_integer_, a = NA_integer_, flagK=.8, reportSequences = FALSE, ...)
QANucleotideByCycle(useFilter = TRUE, addFilter = TRUE, ...)
QANucleotideUse(useFilter = TRUE, addFilter = TRUE, ...)
QAQualityByCycle(useFilter = TRUE, addFilter = TRUE, ...)
QAQualityUse(useFilter = TRUE, addFilter = TRUE, ...)
QAReadQuality(useFilter = TRUE, addFilter = TRUE, flagK = 0.2, flagA = 30L, ...)
QASequenceUse(useFilter = TRUE, addFilter = TRUE, ...)
QAAdapterContamination(useFilter = TRUE, addFilter = TRUE, Lpattern = NA_character_, Rpattern = NA_character_, max.Lmismatch = 0.1, max.Rmismatch = 0.2, min.trim = 9L, ...)
## Order QA report elements
QACollate(src, ...)
## perform analysis
qa2(object, state, ..., verbose=FALSE)
## Outputs from qa2
QA(src, filtered, flagged, ...)
QAFiltered(useFilter = TRUE, addFilter = TRUE, ...)
QAFlagged(useFilter = TRUE, addFilter = TRUE, ...)
## Summarize results as html report
"report"(x, ..., dest = tempfile(), type = "html")
## additional methods; 'flag' is not fully implemented
flag(object, ..., verbose=FALSE)
"rbind"(..., deparse.level = 1)
character(1)
file location of fastq input, as used
by FastqSampler
.integer(1)
number of records to input, as used by
FastqStreamer
(QAFastqSource
). integer(1)
number of sequences to tag as frequent
(QAFrequentSequence
). FastqStreamer
.integer(2)
minimum and maximum reads
above which source files will be flagged as outliers.character(1)
location of the HTML template for
summarizing this report element.ShortReadQ
representation of fastq data.logical()
vector with length equal to
seq
, indicating whether elements of seq
are filtered
(TRUE
) or not.logical(1)
indicating whether the
QA element should be calculating using the filtered
(useFilter=TRUE
) or all reads, and whether reads failing the
QA element should be added to the filter used by subsequent steps
(addFilter = TRUE
) or not.integer(1)
count of number of sequences above which a
read will be considered frequent
(QAFrequentSequence
).flagK
numeric(1)
between 0 and 1
indicating the fraction of frequent sequences greater than or
equal to n
or a
above which a fastq file will be
flagged (QAFrequentSequence
). flagK
numeric{1}
between 0 and 1 and flagA
integer(1)
indicating that a
run should be flagged when the fraction of reads with quality
greater than or equal to flagA
falls below threshold
flagK
.logical(1)
indicating whether frequent
sequences are to be reported.matchPattern
.QAFastqSource
, on which the
quality assessment report will be based.QA
on which
quality metrics will be derived; for end users, this is usually the
result of QACollate
.logical(1)
indicating whether progress reports
should be reported.QAFiltered
and QAFlagged
.QA
on which a report is to be
generated.character(1)
providing the directory in which the
report is to be generated.character(1)
indicating the type of report to be
generated; only html is supported.rbind
.html
to specify the
location of the html source to use as a template for the report..QA
. Values
contained in this object are meant for use by report
Use QACollate
to specify an order in which components of a QA
report are to be assembled. The first argument is the data source
(e.g., QAFastqSource
).
Functions related to data input include:
QAFastqSource
con
is used to construct a
FastqSampler
instance, and records are processed
using qa2,QAFastqSource-method
.
QAData
Possible elements in a QA report are:
QAFrequentSequence
n
or a
can be non-NA, and
determine the number of frequent sequences reported. n
specifies the number of most-frequent sequences to filter, e.g.,
n=10
would filter the top 10 most commonly occurring
sequences; a
provides a threshold frequency (count) above
which reads are filtered. The sample is flagged when a fraction
flagK
of the reads are filtered. reportSequences
determines whether the most commonly
occuring sequences, as determined by n
or a
, are
printed in the html report.
QANucleotideByCycle
QAQualityByCycle
QAQualityUse
QAReadQuality
QASequenceUse
QAAdapterContamination
QA
.dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
fls <- dir(dirPath, "fastq.gz", full=TRUE)
coll <- QACollate(QAFastqSource(fls), QAReadQuality(),
QAAdapterContamination(), QANucleotideUse(),
QAQualityUse(), QASequenceUse(),
QAFrequentSequence(n=10), QANucleotideByCycle(),
QAQualityByCycle())
x <- qa2(coll, BPPARAM=SerialParam(), verbose=TRUE)
res <- report(x)
if (interactive())
browseURL(res)
Run the code above in your browser using DataLab