EZQC: Run quality control checks

Description

EZQC() assesses multiple aspects of your NR-seq data and generates a number of plots visualizing dataset-wide trends.

Usage

EZQC(obj, ...)

Value

A list of ggplot2 objects visualizing the various aspects of your data assessed by EZQC().

Arguments

obj

EZbakRData or EZbakRFractions object.

...

Parameters passed to the class-specific method. If you have provided an EZbakRFractions object, then these can be (all play the same role as in EstimateKinetics(), that is they get passed to EZget() to find the fractions table you are interested in. See ?EstimateKinetics() for details.):

features
populations
fraction_design

If you have provided an EZbakRData object, then these can be (all same the same purpose as in EstimateFractions, so see ?EstimateFractions() for details):

mutrate_populations
features
filter_cols
filter_condition
remove_features

Details

EZQC() checks the following aspects of your NR-seq data. If you have passed an EZbakRData object, then EZQC() checks:

Raw mutation rates: In all sequencing reads, how many T's in the reference were a C in the read? The hope is that raw mutation rates are higher than -label controls in all +label samples. Higher raw mutation rates, especially when using standard label times (e.g., 2 hours or more in mammalian systems), are typically a sign of good label incorporation and low labeled RNA/read dropout. If you don't have -label samples, know that background mutation rates are typically less than 0.2%, so +label raw mutation rates several times higher than this would be preferable.
Mutation rates in labeled and unlabeled reads: The raw mutation rate counts all mutations in all reads. In a standard NR-seq experiment performed with a single metabolic label, there are typically two populations of reads:
1. Those from labeled RNA, having higher mutation rates due to chemical conversion/recoding of the metabolic label and 2) those from unlabeled RNA, having lower, background levels of mutations. EZbakR fits a two component mixture model to estimate the mutation rates in these two populations separately. A successful NR-seq experiment should have a labeled read mutation rate of > 1% and a low background mutation rate of < 0.3%.
2. Read count replicate correlation: Simply the log10 read count correlation for replicates, as inferred from your metadf.

If you have passed an EZbakRFractions object, i.e., the output of EstimateFractions(), then in addition to the checks in the EZbakRData input case, EZQC() also checks:

Fraction labeled distribution: This is the distribution of feature-wise fraction labeled's (or fraction high mutation content's) estimated by EstimateFractions(). The "ideal" is a distribution with mean around 0.5, as this maximizes the amount of RNA with synthesis and degradation kinetics within the dynamic range of the experiment. In practice, you will (and should) be at least a bit lower than this as longer label times risk physiological impacts of metabolic labeling.
Fraction labeled replicate correlation: This is the logit(fraction labeled) correlation between replicates, as inferred from your metadf.

Examples

Run this code


# Simulate data to analyze
simdata <- EZSimulate(30)

# Create EZbakR input
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)

# Estimate Fractions
ezbdo <- EstimateFractions(ezbdo)

# Run QC
QC <- EZQC(ezbdo)

Run the code above in your browser using DataLab