fitNB: Fit Negative Binomial Mixture Model

Description

The function fits a two-component Negative Binomial mixture model.

Usage

fitNB(y, d=NULL, inits=NULL, model='V', zeroPercentThr=0.2)

Value

A vector consisting parameter estimates of mu1, mu2, phi1, phi2, pi1, logLik and BIC. For 0-inflated model, mu1=phi1=0.

Arguments

y: A vector representing the RNAseq raw count.
d: A vector of the same length as y representing the normalization constant to be applied to the data.
inits: Initial value to fit the mixture model. A vector with elements mu1, mu2, phi1, phi2 and pi1. For 0-inflated model, only mu2, phi2, pi1 are used while the other elements can be arbitrary.
model: Character specifying E or V model. E model fits the mixture model with equal dispersion phi while V model doesn't put any constraint.
zeroPercentThr: A scalar specifying the minimum percent of zero counts needed when fitting a zero-inflated Negative Binomial model. This parameter is used to deal with zero-inflation in RNAseq count data. When the percent of zero exceeds this threshold, rather than fitting a 2-component negative binomial mixture, a mixture of point mass at 0 and negative binomial is fitted.

Author

Pan Tong (nickytong@gmail.com), Kevin R Coombes (krc@silicovore.com)

Details

This function directly maximize the log likelihood function through optimization. With this function, three models can be fitted: (1) negative binomial mixture with equal dispersion (E model); (2) negative binomial mixture with unequal dispersion (V model); (3) 0-inflated negative binomial model. The 0-inflated negative binomial has the following density function:

\(P(Y=y)=\pi D(y) + (1-\pi)NB(\mu, \phi)\) where D is the point mass at 0 while \(NB(\mu, \phi)\) is the density of negative binomial distribution with mean \(\mu\) and dispersion \(\phi\). The variance is \(\mu+\phi \mu^2\).

The rule to fit 0-inflated model is that the observed percentage of count exceeds the user specified threshold. This rule overrides the model argument when observed percentae of zero count exceeds the threshold.

References

Tong, P., Chen, Y., Su, X. and Coombes, K. R. (2012). Systematic Identification of Bimodally Expressed Genes Using RNAseq Data. Bioinformatics, 2013 Mar 1;29(5):605-13.

Examples

Run this code

# artificial RNAseq data from negative binomial distribution
set.seed(1000)
dat <- rnbinom(100, mu=1000, size=1/0.2)
fitNB(y=dat)

Run the code above in your browser using DataLab