# ebayes

##### Empirical Bayes Statistics for Differential Expression

Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

- Keywords
- htest

##### Usage

```
ebayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
eBayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
treat(fit, lfc=0, trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
```

##### Arguments

- fit
- an
`MArrayLM`

fitted model object produced by`lmFit`

or`contrasts.fit`

. For`ebayes`

only,`fit`

can alternatively be an unclassed list produced by`lm.series`

,`gls.series`

or`mrlm`

containing components`coefficients`

,`stdev.unscaled`

,`sigma`

and`df.residual`

. - proportion
- numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
- stdev.coef.lim
- numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
- trend
- logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant.
- robust
- logical, should the estimation of
`df.prior`

and`var.prior`

be robustified against outlier sample variances? - winsor.tail.p
- numeric vector of length 1 or 2, giving left and right tail proportions of
`x`

to Winsorize. Used only when`robust=TRUE`

. - lfc
- the minimum log2-fold-change that is considered scientifically meaningful

##### Details

These functions is used to rank genes in order of evidence for differential expression.
They use an empirical Bayes method to shrink the probe-wise sample variances towards a common value and to augmenting the degrees of freedom for the individual variances (Smyth, 2004).
The functions accept as input argument `fit`

a fitted model object from the functions `lmFit`

, `lm.series`

, `mrlm`

or `gls.series`

.
The fitted model object may have been processed by `contrasts.fit`

before being passed to `eBayes`

to convert the coefficients of the design matrix into an arbitrary number of contrasts which are to be tested equal to zero.
The columns of `fit`

define a set of contrasts which are to be tested equal to zero.

The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each probe (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares and residual degrees of freedom have been moderated between probes.

The estimates `s2.prior`

and `df.prior`

are computed by `fitFDist`

.
`s2.post`

is the weighted average of `s2.prior`

and `sigma^2`

with weights proportional to `df.prior`

and `df.residual`

respectively.
The `lods`

is sometimes known as the B-statistic.
The F-statistics `F`

are computed by `classifyTestsF`

with `fstat.only=TRUE`

.

`eBayes`

doesn't compute ordinary (unmoderated) t-statistics by default, but these can be easily extracted from
the linear model output, see the example below.

`ebayes`

is the earlier and leaner function, kept for background capatability, while
`eBayes`

is the later more object-orientated version.
The difference is that `ebayes`

outputs only the empirical Bayes statistics whereas `eBayes`

adds them to the fitted model object `fit`

.
`eBayes`

is recommended for routine use as it produces objects containing all the necessary components for downstream analysis

`treat`

computes empirical Bayes moderated-t p-values relative to a minimum required fold-change threshold.
Use `topTreat`

to summarize output from `treat`

.
Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than `lfc`

in absolute value (McCarthy and Smyth, 2009).
`treat`

is concerned with p-values rather than posterior odds, so it does not compute the B-statistic `lods`

.
The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed.

If `trend=TRUE`

then an intensity-dependent trend is fitted to the prior variances `s2.prior`

.
Specifically, `squeezeVar`

is called with the `covariate`

equal to `Amean`

, the average log2-intensity for each gene.
See `squeezeVar`

for more details.

If `robust=TRUE`

then the robust empirical Bayes procedure of Phipson et al (2013) is used.
See `squeezeVar`

for more details.

##### Value

- t
- numeric vector or matrix of moderated t-statistics
- p.value
- numeric vector of p-values corresponding to the t-statistics
- s2.prior
- estimated prior value for
`sigma^2`

. A vector if`covariate`

is non-`NULL`

, otherwise a scalar. - df.prior
- degrees of freedom associated with
`s2.prior`

- df.total
- numeric vector of total degrees of freedom associated with t-statistics and p-values. Equal to
`df.prior+df.residual`

or`sum(df.residual)`

, whichever is smaller. - s2.post
- numeric vector giving the posterior values for
`sigma^2`

- lods
- numeric vector or matrix giving the log-odds of differential expression
- var.prior
- estimated prior value for the variance of the log2-fold-change for differentially expressed gene
- F
- numeric vector of moderated F-statistics for testing all contrasts defined by the columns of
`fit`

simultaneously equal to zero - F.p.value
- numeric vector giving p-values corresponding to
`F`

`eBayes`

produces an object of class `MArrayLM`

(see `MArrayLM-class`

) containing everything found in `fit`

plus the following added components:
`treat`

a produces an `MArrayLM`

object similar to `eBayes`

but without `lods`

, `var.prior`

, `F`

or `F.p.value`

.
`ebayes`

produces an ordinary list containing the above components except for `F`

and `F.p.value`

.
##### References

McCarthy, D. J., and Smyth, G. K. (2009).
Testing significance relative to a fold-change threshold is a TREAT.
*Bioinformatics* 25, 765-771.
http://bioinformatics.oxfordjournals.org/content/25/6/765

Loennstedt, I., and Speed, T. P. (2002). Replicated microarray data. *Statistica Sinica* **12**, 31-46.

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2013). Empirical Bayes in the presence of exceptional cases, with application to microarray data. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
*Statistical Applications in Genetics and Molecular Biology*, Volume **3**, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf

##### See Also

`squeezeVar`

, `fitFDist`

, `tmixture.matrix`

.

An overview of linear model functions in limma is given by 06.LinearModels.

##### Examples

```
# See also lmFit examples
# Simulate gene expression data,
# 6 microarrays and 100 genes with one gene differentially expressed
set.seed(2004); invisible(runif(100))
M <- matrix(rnorm(100*6,sd=0.3),100,6)
M[1,] <- M[1,] + 1
fit <- lmFit(M)
# Moderated t-statistic
fit <- eBayes(fit)
topTable(fit)
# Ordinary t-statistic
ordinary.t <- fit$coef / fit$stdev.unscaled / fit$sigma
# Q-Q plots of t statistics
# Points off the line may be differentially expressed
par(mfrow=c(1,2))
qqt(ordinary.t, df=fit$df.residual, main="Ordinary t")
abline(0,1)
qqt(fit$t, df=fit$df.total,main="Moderated t")
abline(0,1)
par(mfrow=c(1,1))
```

*Documentation reproduced from package limma, version 3.22.7, License: GPL (>=2)*