exact.nb.test: Exact Negative Binomial Test for Differential Gene Expression

Description

exact.nb.test performs the Robinson and Smyth exact negative binomial (NB) test for differential gene expression on each gene and summarizes the results using p-values and q-values (FDR).

Usage

exact.nb.test(obj, grp1, grp2, print.level = 1)

Arguments

obj

output from estimate.disp.

grp1, grp2

identifiers of the two groups to be compared.

print.level

controls the amount of messages printed: 0 for suppressing all messages, 1 for basic progress messages, larger values for more detailed messages.

Value

nbp.test returns the list obj from the input with the following added components:
grp1grp1, same as input.
grp2grp2, same as input.
pooled.pieestimated pooled mean of relative count frequencies in the two groups being compared.
expression.levelsa matrix of estimated gene expression levels as indicated by reads mean relative count frequencies. It has three columns grp1, grp2, pooled corresponding to the two treatment groups and the pooled mean.
log.fcbase 2 log fold change in mean relative frequency between two groups.
p.valuesp-values of the exact NB test applied to each gene (row).
q.valuesq-values (estimated FDR) corresponding to the p-values.

Details

The negative binomial (NB) distribution offers a more realistic model for RNA-Seq count variability and still permits an exact (non-asymptotic) test for comparing expression levels in two groups.

For each gene, let $S_1$, $S_2$ be the sums of gene counts from all biological replicates in each group. The exact NB test is based on the conditional distribution of $S_1|S_1+S_2$: a value of $S_1$ that is too big or too small, relative to the sum $S_1+S_2$, indicates evidence for differential gene expression. When the effective library sizes are the same in all replicates and the dispersion parameters are known, we can determine the probability functions of $S_1$, $S_2$ explicitly. The exact p-value is computed as the total conditional probability of all possible values of $(S_1, S_2)$ that have the same sum as but are more extreme than the observed values of $(S_1, S_2)$.

Examples

Run this code

## Load Arabidopsis data
  data(arab);

  ## Specify treatment groups
  grp.ids = c(1, 1, 1, 2, 2, 2);

  ## Prepare an NBP object, adjust the library sizes by thinning the counts.
  set.seed(999);

  ## For demonstration purpose, only use the first 100 rows of the arab data.
  obj = prepare.nbp(arab[1:100,], grp.ids, print.level=5);

  ## Estimate the NBP dispersion parameters
  obj = estimate.disp(obj, print.level=5);
  
  ## Perform exact NB test
  grp1 = 1;
  grp2 = 2;
  obj = exact.nb.test(obj, grp1, grp2, print.level=5);

  ## Print the NBP object
  print.nbp(obj);