aovBioCond: Perform a Moderated Analysis of Variance on `bioCond` Objects

Description

Given a set of bioCond objects with which a mean-variance curve is associated, aovBioCond performs a one-way ANOVA-like analysis on them. More specifically, it separately tests for each genomic interval the null hypothesis that mean signal intensity in the interval remains invariant across all the biological conditions.

Usage

aovBioCond(conds, min.var = 0, df.prior = NULL)

Value

aovBioCond returns an object of class

c("aovBioCond", "data.frame"), recording the test results for each genomic interval by each row. The data frame consists of the following variables:

conds.mean: Mean signal intensity at the interval across biological conditions.
between.ms: Between-condition mean of squares as from an ordinary one-way ANOVA.
within.ms: Within-condition mean of squares as from an ordinary one-way ANOVA.
prior.var: Prior variance deduced by reading from the mean-variance curve associated with the bioCond objects in conds.
posterior.var: A weighted average of within.ms and prior.var, with the weights being proportional to their respective numbers of degrees of freedom.
mod.f: Moderated F statistic, which is the ratio of between.ms to posterior.var.
pval: P-value for the statistical significance of this moderated F statistic.
padj: P-value adjusted for multiple testing with the "BH" method (see p.adjust), which controls false discovery rate.

Row names of the returned data frame inherit from those of

conds[[1]]$norm.signal. Besides, several attributes are added to the returned object:

bioCond.names: Names of the bioCond objects in conds.
mean.var.curve: A function representing the mean-variance curve. It accepts a vector of mean signal intensities and returns the corresponding prior variances. Note that this function has incorporated the min.var argument.
df: A length-4 vector giving the numbers of degrees of freedom of between.ms, within.ms, prior.var and posterior.var.

Arguments

conds: A list of bioCond objects on which the analysis of variance is to be performed. They must be associated with the same mean-variance curve (i.e., they must have the same "mvcID"; see also fitMeanVarCurve).
min.var: Lower bound of variances read from the mean-variance curve. Any variance read from the curve less than min.var will be adjusted to this value. It's primarily used for safely getting the prior variances and taking into account the practical significance of a signal variation.
df.prior: Number of prior degrees of freedom associated with the mean-variance curve. Must be non-negative. Can be set to Inf (see "Details"). By default, aovBioCond checks if all the bioConds in conds have the same "df.prior" component, and uses it as the number of prior degrees of freedom if yes (an error is raised otherwise).

Details

aovBioCond adopts the modeling strategy implemented in limma (see "References"), except that each interval has its own prior variance, which is read from the mean-variance curve associated with the bioCond objects. Technically, this function calculates a moderated F statistic for each genomic interval to test the null hypothesis. The moderated F statistic is simply the F statistic from an ordinary one-way ANOVA with its denominator (i.e., sample variance) replaced by posterior variance, which is defined to be a weighted average of sample and prior variances, with the weights being proportional to their respective numbers of degrees of freedom. This method of incorporating the prior information increases the statistical power for the tests.

Two extreme values can be specified for the argument df.prior (number of degrees of freedom associated with the prior variances), representing two distinct cases: when it's set to 0, the prior information won't be used at all, and the tests reduce to ordinary F tests in one-way ANOVA; when it's set to Inf, the denominators of moderated F statistics are simply the prior variances, and these F statistics reduce to following a scaled chi-squared distribution. Other values of df.prior represent intermediate cases. To be noted, the number of prior degrees of freedom is automatically estimated for each mean-variance curve by a specifically designed statistical method (see also fitMeanVarCurve and setMeanVarCurve) and, by default, aovBioCond uses the estimation result to perform the tests. It's highly not recommended to specify df.prior explicitly when calling aovBioCond, unless you know what you are really doing. Besides, aovBioCond won't adjust variance ratio factors of the provided bioConds based on the specified number of prior degrees of freedom (see estimatePriorDf for a description of variance ratio factor).

Note also that, if df.prior is set to 0, of the bioCond objects in conds there must be at least one that contains two or more ChIP-seq samples. Otherwise, there is no way to measure the variance associated with each interval, and an error is raised.

Considering the practical significance of this analysis, which is to select genomic intervals with differential ChIP-seq signals between at least one pair of the biological conditions, those intervals not occupied by any of the bioCond objects in conds may be filtered out before making the selections. Thus, the statistical power of the tests could potentially be improved by re-adjusting p-values of the remaining intervals.

References

Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.

Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.

Examples

Run this code

data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")

## Call differential genomic intervals among GM12890, GM12891 and GM12892
## cell lines.
# \donttest{
# Perform MA normalization and construct bioConds to represent the cell
# lines.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
              GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
              GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)

# Variations in ChIP-seq signals across biological replicates of a cell line
# are generally of a low level, and their relationship with the mean signal
# intensities is expected to be well modeled by the presumed parametric
# form.
conds <- fitMeanVarCurve(conds, method = "parametric", occupy.only = TRUE)
summary(conds[[1]])
plotMeanVarCurve(conds, subset = "occupied")

# Perform a moderated ANOVA on these cell lines.
res <- aovBioCond(conds)
head(res)
plot(res, padj = 1e-6)
# }

Run the code above in your browser using DataLab