nbm_em: Expectation conditional maximization of likelihood for negative binomial mixture model

Description

Given an input read count vector of integers, the function optimzes the parameters for the negative binomial mixture model of K components using expectation conditional maximization.

Usage

nbm_em(count, alpha, beta, wght, NBM_NIT_MAX = 250, NBM_TOL = 0.01)

Arguments

count

A vector of integers, conceptaully representing the read counts within bins of chromosome.

alpha

Initial values for $\alpha_k$ for all K NB.

beta

Initial values for $\beta_k$ for all K NB.

wght

Initial values for $\pi_k$ for all K NB.

NBM_NIT_MAX

Maximum number of EM iterations (Default: 250).

NBM_TOL

Threshold as fraction of increase in likelihood (given the current NBM parameters) comparing with the likelihood from the last iteration. EM for the NBM stops when the improvement is below the threshold (Default: 0.01).

Value

A list containing:
alphaalpha_k for all K components of NB.
betabeta_k for all K components of NB.
wghtpi_k for all K components of NB.
loglLog likelihood in each EM iteration.
postprobPosterior probabilities for each observed data point in the last EM iteration.

Details

Given a $K$-NBM, the goal is to maximize the likelihood function with respect to the parameters comprising of $\alpha_k$ and $\beta_k$ for the K NB components and the mixing coefficients $\pi_k$, which are the priors $p(z=k)$. Because there is no analytical solution for the maximum likelihood (ML) estimators of the above quantities, a modified EM procedures called Expectation Conditional Maximization is employed (Meng and Rubin, 1994).

In E-step, the posterior probability is evaluated using NB density functions with initialized $\alpha_k$, $\beta_k$, and $\pi_k$. In the CM step, $\pi_k$ is evaluated first followed by Newton updates of $\alpha_k$ and $\beta_k$. EM iteration terminates when the percetnage of increase of log likelihood drop below NBM_TOL, which is deterministic since EM is guaranteed to converge. For more details, please see the manuscript of RIPSeeker.

References

Bishop, Christopher. Pattern recognition and machine learning. Number 605-631 in Information Science and Statisitcs. Springer Science, 2006.

X. L. Meng, D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80(2):267-278 (1993).

J. A. Fessler, A. O. Hero, Space-alternating generalized expectation-maximization algorithm, IEEE Tr. on Signal Processing, 42(10):2664 -2677 (1994).

Capp'e, O. (2001). H2M : A set of MATLAB/OCTAVE functions for the EM estimation of mixtures and hidden Markov models. (http://perso.telecom-paristech.fr/cappe/h2m/)

Examples

Run this code

# Simulate data
TRANS_s <- matrix(c(0.9, 0.1, 0.3, 0.7), nrow=2, byrow=TRUE)
alpha_s <- c(2, 4)
beta_s  <- c(1, 0.25)
Total <- 1000
x <- nbh_gen(TRANS_s, alpha_s, beta_s, Total);

N <- 2

cnt <- x$count
label <- x$label

Total <- length(cnt)

# dummy initialization
wght0 <- c(0.5,0.5)

alpha0 <- c(1, 20)

beta0 <- c(1, 1)

NIT_MAX <- 50
TOL <- 1e-100

# initialize param with nbm

nbm <- nbm_em(cnt, alpha0, beta0, wght0, NIT_MAX, TOL)

map.accuracy <- length(which(max.col(nbm$postprob) == label))/Total

print(map.accuracy)

Run the code above in your browser using DataLab

Get 50% off unlimited learning