DESeq
Differential expression analysis based on the Negative Binomial (a.k.a. GammaPoisson) distribution
This function performs a default analysis through the steps:
 estimation of size factors:
estimateSizeFactors
 estimation of dispersion:
estimateDispersions
 Negative Binomial GLM fitting and Wald statistics:
nbinomWaldTest
For complete details on each step, see the manual pages of the respective
functions. After the DESeq
function returns a DESeqDataSet object,
results tables (log2 fold changes and pvalues) can be generated
using the results
function. See the manual page
for results
for information on independent filtering and
pvalue adjustment for multiple test correction.
Usage
DESeq(object, test = c("Wald", "LRT"), fitType = c("parametric", "local", "mean"), betaPrior, full = design(object), reduced, quiet = FALSE, minReplicatesForReplace = 7, modelMatrixType, parallel = FALSE, BPPARAM = bpparam())
Arguments
 object
 a DESeqDataSet object, see the constructor functions
DESeqDataSet
,DESeqDataSetFromMatrix
,DESeqDataSetFromHTSeqCount
.  test
 either "Wald" or "LRT", which will then use either
Wald significance tests (defined by
nbinomWaldTest
), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined bynbinomLRT
)  fitType
 either "parametric", "local", or "mean"
for the type of fitting of dispersions to the mean intensity.
See
estimateDispersions
for description.  betaPrior
 whether or not to put a zeromean normal prior on
the nonintercept coefficients
See
nbinomWaldTest
for description of the calculation of the beta prior. By default, the beta prior is used only for the Wald test, but can also be specified for the likelihood ratio test.  full
 for
test="LRT"
, the full model formula, which is restricted to the formula indesign(object)
. alternatively, it can be a model matrix constructed by the user. advanced use: specifying a model matrix for full andtest="Wald"
is possible ifbetaPrior=FALSE
 reduced
 for
test="LRT"
, a reduced formula to compare against, i.e., the full formula with the term(s) of interest removed. alternatively, it can be a model matrix constructed by the user  quiet
 whether to print messages at each step
 minReplicatesForReplace
 the minimum number of replicates required
in order to use
replaceOutliers
on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set toInf
in order to never replace outliers.  modelMatrixType
 either "standard" or "expanded", which describe
how the model matrix, X of the GLM formula is formed.
"standard" is as created by
model.matrix
using the design formula. "expanded" includes an indicator variable for each level of factors in addition to an intercept. for more information see the Description ofnbinomWaldTest
. betaPrior must be set to TRUE in order for expanded model matrices to be fit.  parallel
 if FALSE, no parallelization. if TRUE, parallel
execution using
BiocParallel
, see next argumentBPPARAM
. A note on running in parallel usingBiocParallel
: it may be advantageous to remove large, unneeded objects from your current R environment before callingDESeq
, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.  BPPARAM
 an optional parameter object passed internally
to
bplapply
whenparallel=TRUE
. If not specified, the parameters last registered withregister
will be used.
Details
The differential expression analysis uses a generalized linear model of the form:
$$ K_{ij} \sim \textrm{NB}( \mu_{ij}, \alpha_i) $$ $$ \mu_{ij} = s_j q_{ij} $$ $$ \log_2(q_{ij}) = x_{j.} \beta_i $$
where counts $K_ij$ for gene i, sample j are modeled using
a Negative Binomial distribution with fitted mean $mu_ij$
and a genespecific dispersion parameter $alpha_i$.
The fitted mean is composed of a samplespecific size factor
$s_j$ and a parameter $q_ij$ proportional to the
expected true concentration of fragments for sample j.
The coefficients $beta_i$ give the log2 fold changes for gene i for each
column of the model matrix $X$.
The samplespecific size factors can be replaced by
genespecific normalization factors for each sample using
normalizationFactors
.
For details on the fitting of the log2 fold changes and calculation of pvalues,
see nbinomWaldTest
if using test="Wald"
,
or nbinomLRT
if using test="LRT"
.
Experiments without replicates do not allow for estimation of the dispersion
of counts around the expected value for each group, which is critical for
differential expression analysis. If an experimental design is
supplied which does not contain the necessary degrees of freedom for differential
analysis, DESeq
will provide a message to the user and follow
the strategy outlined in Anders and Huber (2010)
under the section 'Working without replicates', wherein all the samples
are considered as replicates of a single group for the estimation of dispersion.
As noted in the reference above: "Some overestimation of the variance
may be expected, which will make that approach conservative."
Furthermore, "while one may not want to draw strong conclusions from such an analysis,
it may still be useful for exploration and hypothesis generation."
The argument minReplicatesForReplace
is used to decide which samples
are eligible for automatic replacement in the case of extreme Cook's distance.
By default, DESeq
will replace outliers if the Cook's distance is
large for a sample which has 7 or more replicates (including itself).
This replacement is performed by the replaceOutliers
function. This default behavior helps to prevent filtering genes
based on Cook's distance when there are many degrees of freedom.
See results
for more information about filtering using
Cook's distance, and the 'Dealing with outliers' section of the vignette.
Unlike the behavior of replaceOutliers
, here original counts are
kept in the matrix returned by counts
, original Cook's
distances are kept in assays(dds)[["cooks"]]
, and the replacement
counts used for fitting are kept in assays(dds)[["replaceCounts"]]
.
Note that if a log2 fold change prior is used (betaPrior=TRUE)
then expanded model matrices will be used in fitting. These are
described in nbinomWaldTest
and in the vignette. The
contrast
argument of results
should be used for
generating results tables.
Value

a
DESeqDataSet
object with results stored as
metadata columns. These results should accessed by calling the results
function. By default this will return the log2 fold changes and pvalues for the last
variable in the design formula. See results
for how to access results
for other variables.
References
Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology 2014, 15:550. http://dx.doi.org/10.1186/s1305901405508
See Also
Examples
# see vignette for suggestions on generating
# count tables from RNASeq data
cnts < matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
cond < factor(rep(1:2, each=5))
# object construction
dds < DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond)
# standard analysis
dds < DESeq(dds)
res < results(dds)
# an alternate analysis: likelihood ratio test
ddsLRT < DESeq(dds, test="LRT", reduced= ~ 1)
resLRT < results(ddsLRT)