prepare.nbp(counts, grp.ids, lib.sizes = colSums(counts), norm.factors = NULL,
thinning = TRUE, print.level = 1)NULL (default), no normalization will be applied.TRUE (default), the counts will be randomly
down sampled to make effective library sizes approximately equal.We take gene expression to be indicated by relative frequency of RNA-Seq reads mapped to a gene, relative to library sizes (column sums of the count matrix). Since the relative frequencies sum to 1 in each library (one column of the count matrix), the increased relative frequencies of truly over expressed genes in each column must be accompanied by decreased relative frequencies of other genes, even when those others do not truly differently express. Robinson and Oshlack (2010) presented examples where this problem is noticeable.
A simple fix is to compute the relative frequencies relative to effective library sizes---library sizes multiplied by normalization factors. Many authors (Robinson and Oshlack (2010), Anders and Huber (2010)) propose to estimate the normalization factors based on the assumption that most genes are NOT differentially expressed.
By default, prepare.nbp does not estimate the normalization
factors, but can incorporate user specified normalization factors
through the argument norm.factors.
Library Size Adjustment
The exact test requires that the effective library sizes (column sums
of the count matrix multiplied by normalization factors) are
approximately equal. By default, prepare.nbp will thin
(downsample) the counts to make the effective library sizes
equal. Thinning may lose statistical efficiency, but is unlikely to
introduce bias.
nbp.test## Load Arabidopsis data
data(arab);
## Specify treatment groups
grp.ids = c(1, 1, 1, 2, 2, 2);
## Prepare an NBP object, adjust the library sizes by thinning the counts.
set.seed(999);
obj = prepare.nbp(arab, grp.ids, print.level=5);
## Print the NBP object
print.nbp(obj);Run the code above in your browser using DataLab