estimateCommonDisp: Estimate Common Negative Binomial Dispersion by Conditional Maximum Likelihood

Description

Maximizes the negative binomial conditional common likelihood to estimate a common dispersion value across all genes.

Usage

"estimateCommonDisp"(y, tol=1e-06, rowsum.filter=5, verbose=FALSE, ...)
"estimateCommonDisp"(y, group=NULL, lib.size=NULL, tol=1e-06,  rowsum.filter=5, verbose=FALSE, ...)

Arguments

matrix of counts or a DGEList object.

tol

the desired accuracy, passed to optimize.

rowsum.filter

genes with total count (across all samples) below this value will be filtered out before estimating the dispersion.

verbose

logical, if TRUE then the estimated dispersion and BCV will be printed to standard output.

group

vector or factor giving the experimental group/condition for each library.

lib.size

numeric vector giving the total count (sequence depth) for each library.

...

other arguments that are not currently used.

Value

common.dispersion: estimate of the common dispersion.
pseudo.counts: numeric matrix of pseudo-counts.
pseudo.lib.size: the common library size to which the pseudo-counts have been adjusted.
AveLogCPM: numeric vector giving log2(AveCPM) for each row of y.

Details

Implements the conditional maximum likelihood (CML) method proposed by Robinson and Smyth (2008) for estimating a common dispersion parameter. This method proves to be accurate and nearly unbiased even for small counts and small numbers of replicates.

The CML method involves computing a matrix of quantile-quantile normalized counts, called pseudo-counts. The pseudo-counts are adjusted in such a way that the library sizes are equal for all samples, while preserving differences between groups and variability within each group. The pseudo-counts are included in the output of the function, but are intended mainly for internal edgeR use.

References

Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9, 321-332. http://biostatistics.oxfordjournals.org/content/9/2/321

Examples

Run this code

# True dispersion is 1/5=0.2
y <- matrix(rnbinom(250*4,mu=20,size=5),nrow=250,ncol=4)
dge <- DGEList(counts=y,group=c(1,1,2,2))
dge <- estimateCommonDisp(dge, verbose=TRUE)

Run the code above in your browser using DataLab