QuasR (version 1.12.0)

qQCReport: QuasR Quality Control Report

Description

Generate quality control plots for a qProject object or a vector of fasta/fastq/bam files. The available plot vary depending on the types of available input (fasta, fastq or bam files).

Usage

qQCReport(input, pdfFilename=NULL, chunkSize=1e6L, useSampleNames=FALSE, clObj=NULL, ...)

Arguments

input
A vector of files or a qProject object as returned by qAlign
pdfFilename
The path and name of a pdf file to store the report. If NULL, the quality control plots will be generated in separate plotting windows on the standard graphical device.
chunkSize
The number of sequences, sequence pairs (for paired-end data) or alignments that will be sampled from each data file to collect quality statistics
useSampleNames
If TRUE, the plots will be labelled using the sample names instead of the file names. Sample names are obtained from the qProject object, or from names(input) if input is a named vector of file names. Please not that if there are multiple files for the same sample, the sample names will not be unique.
clObj
a cluster object to be used for parallel processing of multiple input files.
...
additional arguments that will be passed to the functions generating the individual quality control plots, see ‘Details’.

Value

control plots. It invisibly returns a list with components that contain the data used to generate each of the QC plots. Available components are (depending on input data, see ‘Details’ above):
  • qualByCycle: quality score boxplot
  • nuclByCycle: nucleotide frequency plot
  • duplicated: duplication level plot
  • mappings: mapping statistics barplot
  • uniqueness: library complexity barplot
  • errorsByCycle: mismatch frequency plot
  • mismatchTypes: mismatch type plot
  • fragDistribution: fragment size distribution plot

Details

This function generates quality control plots for all input files or the sequence and alignment files contained in a qProject object, allowing assessment of the quality of a sequencing experiment. qQCReport uses functionality from the ShortRead package to collect quality data, and visualizes the results similarly as the ‘FastQC’ quality control tool from Simon Andrews (see ‘References’ below). It is recommended to create PDF reports (pdfFilename argument), for which the plot layouts have been optimised. Some plots will only be generated if the necessary information is available (e.g. base qualities in fastq sequence files).

The currently available plot types are:

Quality score boxplot
shows the distribution of base quality values as a box plot for each position in the input sequence. The background color (green, orange or red) indicates ranges of high, intermediate and low qualtities. The plot is available for fastq and bam files.

Nucleotide frequency
plot shows the frequency of A, C, G, T and N bases by position in the read. The plot is always available.

Duplication level
plot shows for each sample the fraction of reads observed at different duplication levels (e.g. once, two-times, three-times, etc.). In addition, the most frequent sequences are listed. The plot is available for fasta, fastq and bam files.

Mapping statistics
shows fractions of reads that were (un)mappable to the reference genome. This plot is available for bam input.

Library complexity
shows fractions of unique read(-pair) alignment positions, as a measure of the complexity in the sequencing library. Please note that this measure is not independent from the total number of reads in a library, and is best compared between libraries of similar sizes. This plot is available for bam input.

Mismatch frequency
shows the frequency and position (relative to the read sequence) of mismatches in the alignments against the reference genome. The plot is available for bam input.

Mismatch types
shows the frequency of read bases that caused mismatches in the alignments to the reference genome, separately for each genome base. This plot is available for bam input.

Fragment size
shows the distribution of fragment sizes inferred from aligned read pairs. This plot is available for paired-end bam input.

One approach to assess the quality of a sample is to compare its control plots to the ones from other samples and search for relative differences. Special quality measures are expected for certain types of experiments: A genomic re-sequencing sample with an overrepresentation of T bases may be suspicious, while such a nucleotide bias is normal for a directed bisulfite-sequencing sample.

Additional arguments can be passed to the internal functions that generate the individual quality control plots using ...{}:

lmat:
a matrix (e.g. matrix(1:12, ncol=2)) used by an internal call to the layout function to specify the positioning of multiple plot panels on a device page. Individual panels correspond to different samples.

breaks:
a numerical vector (e.g. c(1:10)) defining the bins used by the ‘Duplication level’ plot.

References

FastQC quality control tool at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

See Also

qProject, qAlign, ShortRead package

Examples

Run this code
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"

proj <- qAlign(sampleFile, genomeFile)

# create quality control report
qQCReport(proj, pdfFilename="qc_report.pdf")

Run the code above in your browser using DataLab