nbp.test calls prepare.nbp to create the NBP data
structure, perform optional normalization and adjust library sizes,
calls estimate.disp to estimate the NBP dispersion parameters and
exact.nb.test to perform the exact NB test for differential
gene expression on each gene. The results are summarized using p-values and q-values
(FDR). Overview{
For assessing evidence for differential gene expression from RNA-Seq
read counts, it is critical to adequately model the count variability
between independent biological replicates. Negative binomial (NB)
distribution offers a more realistic model for RNA-Seq count
variability than Poisson distribution and still permits an exact
(non-asymptotic) test for comparing two groups.
For each individual gene, a NB distribution uses a dispersion
parameter $\phi_i$ to model the extra-Poisson variation between
biological replicates. Across all genes, the NBP parameterization of
the NB distribution (the NBP model) uses two parameters $(\phi,
\alpha)$ to model extra-Poisson variation over the entire range of
expression levels. The NBP model allows the NB dispersion parameter to
be an arbitrary power function of the mean ($\phi_i =
\phi\mu_i^{2-\alpha}$). The NBP model includes the Poisson model as a
limiting case (as $\phi$ tends to $0$) and the NB2 model as a
special case (when $\alpha=2$). Under the NB2 model, the
dispersion parameter is a constant and does not vary with the mean
expression levels. NBP model is more flexible and is the recommended
default option.}
Count Normalization{
We take gene expression to be indicated by relative frequency of
RNA-Seq reads mapped to a gene, relative to library sizes (column sums
of the count matrix). Since the relative frequencies sum to 1 in each
library (one column of the count matrix), the increased relative
frequencies of truly over expressed genes in each column must be
accompanied by decreased relative frequencies of other genes, even
when those others do not truly differently express. Robinson and
Oshlack (2010) presented examples where this problem is
noticeable.
A simple fix is to compute the relative frequencies relative to
effective library sizes---library sizes multiplied by normalization
factors.
By default, nbp.test assumes the normalization factors are 1 (i.e. no
normalization is needed). Users can specify normalization factors through the argument norm.factors.
Many authors (Robinson and Oshlack (2010), Anders and Huber
(2010)) propose to estimate the normalization factors based on the
assumption that most genes are NOT differentially expressed.
}
Library Size Adjustment{
The exact test requires that the effective library sizes (column sums
of the count matrix multiplied by normalization factors) are
approximately equal. By default, nbp.test will thin
(downsample) the counts to make the effective library sizes
equal. Thinning may lose statistical efficiency, but is unlikely to
introduce bias.
}