Empirical Bayes Statistics for Differential Expression
Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.
ebayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1)) eBayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1)) treat(fit, lfc=0, trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
MArrayLMfitted model object produced by
fitcan alternatively be an unclassed list produced by
- numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
- numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
- logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant.
- logical, should the estimation of
var.priorbe robustified against outlier sample variances?
- numeric vector of length 1 or 2, giving left and right tail proportions of
xto Winsorize. Used only when
- the minimum log2-fold-change that is considered scientifically meaningful
These functions is used to rank genes in order of evidence for differential expression.
They use an empirical Bayes method to shrink the probe-wise sample variances towards a common value and to augmenting the degrees of freedom for the individual variances (Smyth, 2004).
The functions accept as input argument
fit a fitted model object from the functions
The fitted model object may have been processed by
contrasts.fit before being passed to
eBayes to convert the coefficients of the design matrix into an arbitrary number of contrasts which are to be tested equal to zero.
The columns of
fit define a set of contrasts which are to be tested equal to zero.
The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each probe (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares and residual degrees of freedom have been moderated between probes.
df.prior are computed by
s2.post is the weighted average of
sigma^2 with weights proportional to
lods is sometimes known as the B-statistic.
F are computed by
eBayes doesn't compute ordinary (unmoderated) t-statistics by default, but these can be easily extracted from
the linear model output, see the example below.
ebayes is the earlier and leaner function, kept for background capatability, while
eBayes is the later more object-orientated version.
The difference is that
ebayes outputs only the empirical Bayes statistics whereas
eBayes adds them to the fitted model object
eBayes is recommended for routine use as it produces objects containing all the necessary components for downstream analysis
treat computes empirical Bayes moderated-t p-values relative to a minimum required fold-change threshold.
topTreat to summarize output from
Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than
lfc in absolute value (McCarthy and Smyth, 2009).
treat is concerned with p-values rather than posterior odds, so it does not compute the B-statistic
The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed.
trend=TRUE then an intensity-dependent trend is fitted to the prior variances
squeezeVar is called with the
covariate equal to
Amean, the average log2-intensity for each gene.
squeezeVar for more details.
robust=TRUE then the robust empirical Bayes procedure of Phipson et al (2013) is used.
squeezeVar for more details.
- numeric vector or matrix of moderated t-statistics
- numeric vector of p-values corresponding to the t-statistics
- estimated prior value for
sigma^2. A vector if
NULL, otherwise a scalar.
- degrees of freedom associated with
- numeric vector of total degrees of freedom associated with t-statistics and p-values. Equal to
sum(df.residual), whichever is smaller.
- numeric vector giving the posterior values for
- numeric vector or matrix giving the log-odds of differential expression
- estimated prior value for the variance of the log2-fold-change for differentially expressed gene
- numeric vector of moderated F-statistics for testing all contrasts defined by the columns of
fitsimultaneously equal to zero
- numeric vector giving p-values corresponding to
eBayesproduces an object of class
MArrayLM-class) containing everything found in
fitplus the following added components:
treata produces an
MArrayLMobject similar to
ebayesproduces an ordinary list containing the above components except for
McCarthy, D. J., and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765-771. http://bioinformatics.oxfordjournals.org/content/25/6/765
Loennstedt, I., and Speed, T. P. (2002). Replicated microarray data. Statistica Sinica 12, 31-46.
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2013). Empirical Bayes in the presence of exceptional cases, with application to microarray data. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Volume 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf
An overview of linear model functions in limma is given by 06.LinearModels.
# See also lmFit examples # Simulate gene expression data, # 6 microarrays and 100 genes with one gene differentially expressed set.seed(2004); invisible(runif(100)) M <- matrix(rnorm(100*6,sd=0.3),100,6) M[1,] <- M[1,] + 1 fit <- lmFit(M) # Moderated t-statistic fit <- eBayes(fit) topTable(fit) # Ordinary t-statistic ordinary.t <- fit$coef / fit$stdev.unscaled / fit$sigma # Q-Q plots of t statistics # Points off the line may be differentially expressed par(mfrow=c(1,2)) qqt(ordinary.t, df=fit$df.residual, main="Ordinary t") abline(0,1) qqt(fit$t, df=fit$df.total,main="Moderated t") abline(0,1) par(mfrow=c(1,1))