nbp.test fits a NBP model to the RNA-Seq counts and performs Robinson and Smyth's
exact NB test on each gene to assess differential gene expression between two groups.nbp.test(counts, grp.ids, grp1, grp2, norm.factors, method.disp = "NBP",
print.level, ...)estimate.disp, the function that estimates the
dispersion parameters.nbp.test returns a list with the following components:grp1, grp2, pooled corresponding to
the two treatment groups and the pooled mean.nbp.test calls prepare.nbp to create the NBP data
structure, perform optional normalization and adjust library sizes,
calls estimate.disp to estimate the NBP dispersion parameters and
exact.nb.test to perform the exact NB test for differential
gene expression on each gene. The results are summarized using p-values and q-values
(FDR).
For each individual gene, a NB distribution uses a dispersion
parameter $\phi_i$ to model the extra-Poisson variation between
biological replicates. Across all genes, the NBP parameterization of
the NB distribution (the NBP model) uses two parameters $(\phi,
\alpha)$ to model extra-Poisson variation over the entire range of
expression levels. The NBP model allows the NB dispersion parameter to
be an arbitrary power function of the mean ($\phi_i =
\phi\mu_i^{2-\alpha}$). The NBP model includes the Poisson model as a
limiting case (as $\phi$ tends to $0$) and the NB2 model as a
special case (when $\alpha=2$). Under the NB2 model, the
dispersion parameter is a constant and does not vary with the mean
expression levels. NBP model is more flexible and is the recommended
default option.}
A simple fix is to compute the relative frequencies relative to
effective library sizes---library sizes multiplied by normalization
factors.
By default, nbp.test assumes the normalization factors are 1 (i.e. no
normalization is needed). Users can specify normalization factors through the argument norm.factors.
Many authors (Robinson and Oshlack (2010), Anders and Huber
(2010)) propose to estimate the normalization factors based on the
assumption that most genes are NOT differentially expressed.
}
nbp.test will thin
(downsample) the counts to make the effective library sizes
equal. Thinning may lose statistical efficiency, but is unlikely to
introduce bias.
}
Robinson, M. D. and G. K. Smyth (2008): "Small-sample estimation of negative binomial dispersion, with applications to SAGE data," Biostatistics, 9, 321-332.
Anders, S. and W. Huber (2010): "Differential expression analysis for sequence count data," Genome Biol., 11, R106.
Robinson, M. D. and A. Oshlack (2010): "A scaling normalization method for differential expression analysis of RNA-seq data," Genome Biol., 11, R25.
prepare.nbp, estimate.disp, exact.nb.test.## Load Arabidopsis data
data(arab);
## Specify treatment groups and ids of the two groups to be compared
grp.ids = c(1, 1, 1, 2, 2, 2);
grp1 = 1;
grp2 = 2;
## Set random number seed to make results reproducible
set.seed(999);
## Fit the NBP model and perform exact NB test for differential gene expression
res = nbp.test(arab, grp.ids, grp1, grp2, print.level = 5);
## Show top ten most differentially expressed genes
subset = order(res$p.values)[1:10];
print.nbp(res, subset);
## Count the number of differentially expressed genes (e.g. qvalue < 0.05)
alpha = 0.05;
sig.res = res$q.values < alpha;
table(sig.res);
## Show boxplots, MA-plot, mean-variance plot and mean-dispersion plot
plot.nbp(res);Run the code above in your browser using DataLab