Uses the strategy described here, and similar to that originally presented in Berg et al. 2024.
CorrectDropout(
obj,
strategy = c("grandR", "bakR"),
grouping_factors = NULL,
features = NULL,
populations = NULL,
fraction_design = NULL,
repeatID = NULL,
exactMatch = TRUE,
read_cutoff = 25,
dropout_cutoff = 5,
...
)An EZbakRData object with the specified "fractions" table replaced
with a dropout corrected table.
An EZbakRFractions object, which is an EZbakRData object on which
you have run EstimateFractions().
Which dropout correction strategy to use. Options are:
grandR: Described here. Cite that work and grandR if using this strategy. Quasi-non-parametric strategy that finds an estimate of the dropout rate that eliminates any linear correlation between the newness of a transcript and the difference in +s4U and -s4U normalized read counts.
bakR: Described here. Uses a simple generative model of dropout to derive a likelihood function, and the dropout rate is estimated via the method of maximum likelihood.
The "bakR" strategy has the advantage of being model-derived, making it possible to assess model fit and thus whether the simple assumptions of both the "bakR" and "grandR" dropout models are met. The "grandR" strategy has the advantage of being more robust. Thus, the "grandR" strategy is currently used by default.
Which sample-detail columns in the metadf should be used
to group -s4U samples by for calculating the average -s4U RPM? The default value of
NULL will cause all sample-detail columns to be used.
Character vector of the set of features you want to stratify
reads by and estimate proportions of each RNA population. The default of NULL
will expect there to be only one fractions table in the EZbakRFractions object.
Mutational populations that were analyzed to generate the fractions table to use. For example, this would be "TC" for a standard s4U-based nucleotide recoding experiment.
"Design matrix" specifying which RNA populations exist
in your samples. By default, this will be created automatically and will assume
that all combinations of the mutrate_populations you have requested to analyze are
present in your data. If this is not the case for your data, then you will have
to create one manually. See docs for EstimateFractions (run ?EstimateFractions()) for more details.
If multiple fractions tables exist with the same metadata,
then this is the numerical index by which they are distinguished.
If TRUE, then features must exactly match the features
metadata for a given fractions table for it to be used. Means that you cannot
specify a subset of features by default. Set this to FALSE if you would like
to specify a feature subset.
Minimum number of reads for a feature to be used to fit the dropout model.
Maximum ratio of -s4U:+s4U RPMs for a feature to be used to fit the dropout model (i.e., simple outlier filtering cutoff).
Parameters passed to internal calculate_dropout() function;
namely dropout_cutoff_min, which sets the minimum dropout value used for
fitting the dropout model.
Dropout is the disproportionate loss of labeled RNA/reads from said RNA
described independently here
and here. It can originate from a combination of
bioinformatic (loss of high mutation content reads due to alignment problems),
technical (loss of labeled RNA during RNA extraction), and biological (transcriptional
shutoff in rare cases caused by metabolic label toxicity) sources.
CorrectDropout() compares label-fed and label-free controls from the same
experimental conditions to estimate and correct for this dropout. It assumes
that there is a single number (referred to as the dropout rate, or pdo) which
describes the rate at which labeled RNA is lost (relative to unlabeled RNA).
pdo ranges from 0 (no dropout) to 1 (complete loss of all labeled RNA), and
is thus interpreted as the percentage of labeled RNA/reads from labeled RNA
disproportionately lost, relative to the equivalent unlabeled species.
# Simulate data to analyze
simdata <- EZSimulate(30)
# Create EZbakR input
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)
# Estimate Fractions
ezbdo <- EstimateFractions(ezbdo)
# Correct for dropout
ezbdo <- CorrectDropout(ezbdo)
Run the code above in your browser using DataLab