normalmixEM2comp: Fast EM Algorithm for 2-Component Mixtures of Univariate Normals

Description

Return EM algorithm output for mixtures of univariate normal distributions for the special case of 2 components, exploiting the simple structure of the problem to speed up the code.

Usage

normalmixEM2comp(x, lambda, mu, sigsqrd, eps= 1e-8, maxit = 1000, verb=FALSE)

Arguments

A vector of length $n$ consisting of the data.

lambda

Initial value of first-component mixing proportion.

A 2-vector of initial values for the mean parameters.

sigsqrd

Either a scalar or a 2-vector with initial value(s) for the variance parameters. If a scalar, the algorithm assumes that the two components have equal variances; if a 2-vector, it assumes that the two components do not have equal variances.

eps

The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon.

maxit

The maximum possible number of iterations.

verb

If TRUE, then various updates are printed during each iteration of the algorithm.

Value

x: The raw data.
lambda: The final mixing proportions (lambda and 1-lambda).
mu: The final two mean parameters.
sigma: The final one or two standard deviations.
loglik: The final log-likelihood.
posterior: An nx2 matrix of posterior probabilities for observations.
all.loglik: A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length.
restarts: The number of times the algorithm restarted due to unacceptable choice of initial values (always zero).
ft: A character vector giving the name of the function.

Details

This code is written to be very fast, sometimes more than an order of magnitude faster than normalmixEM for the same problem. It is less numerically stable that normalmixEM in the sense that it does not safeguard against underflow as carefully. Note that when the two components are assumed to have unequal variances, the loglikelihood is unbounded. However, in practice this is rarely a problem and quite often the algorithm converges to a "nice" local maximum.

References

McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley \& Sons, Inc.

Examples

Run this code

##Analyzing the Old Faithful geyser data with a 2-component mixture of normals.

data(faithful)
attach(faithful)
set.seed(100)
system.time(out <- normalmixEM2comp(waiting, lambda=.5, 
            mu=c(50,80), sigsqrd=100))
out$all.loglik # Note:  must be monotone increasing

# Compare elapsed time with more general version
system.time(out2 <- normalmixEM(waiting, lambda=c(.5,.5), 
            mu=c(50,80), sigma=c(10,10), arbvar=FALSE))
out2$all.loglik # Values should be identical to above

Run the code above in your browser using DataLab