estimate.dispersion: Estimate Negative Binomial Dispersion

Description

Estimate NB dispersion by modeling it as a function of the mean frequency and library sizes.

Usage

estimate.dispersion(nb.data, x,
    method = "log-linear-rel-mean", ...)

Arguments

nb.data

output from prepare.nb.data.

a design matrix specifiying the mean structure of each row.

method

the method for estimating the dispersion parameter. Currenlty, the only implemented option is "log-linear-rel-mean", which assumes that log dispersion is a log-linear function of the relative mean.

...

additional parameters.

Value

a list of two components:
estiamtesdispersion estimates for each read count, a matrix of the same dimensions as the counts matrix in nb.data.
modelsa list of dispersion models, NOT intended for use by end users.

Details

We use a negative binomial (NB) distribution to model the read frequency of gene $i$ in sample $j$. A negative binomial (NB) distribution uses a dispersion parameter $\phi_{ij}$ to model the extra-Poisson variation between biological replicates. Under the NB model, the mean-variance relationship of a single read count satisfies $\sigma_{ij}^2 = \mu_{ij} + \phi_{ij} \mu_{ij}^2$. Due to the typically small sample sizes of RNA-Seq experiments, estimating the NB dispersion $\phi_{ij}$ for each gene $i$ separately is not reliable. One can pool information across genes and biological samples by modeling $\phi_{ij}$ as a function of the mean frequencies and library sizes. The "log-linear-rel-mean" method assumes a parametric dispersion model $$\phi_{ij} = \alpha_0 + \alpha_1 \log(\pi_{ij}),$$ where $\pi_{ij} = \mu_{ij}/(N_j R_j)$ is the relative mean frequency after normalization. The parameters $(\alpha_0, \alpha_1)$ in this dispersion model are estimated by maximizing the adjusted profile likelihood.