normalmixEM: EM Algorithm for Mixtures of Univariate Normals

Description

Return EM algorithm output for mixtures of normal distributions.

Usage

normalmixEM (x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, 
             mean.constr = NULL, sd.constr = NULL,
             epsilon = 1e-08, maxit = 1000, maxrestarts=20, 
             verb = FALSE, fast=FALSE, ECM = FALSE,
             arbmean = TRUE, arbvar = TRUE)

Arguments

A vector of length n consisting of the data.

lambda

Initial value of mixing proportions. Automatically repeated as necessary to produce a vector of length k, then normalized to sum to 1. If NULL, then lambda is random from a uniform Dirichlet distributi

Starting value of vector of component means. If non-NULL and a scalar, arbmean is set to FALSE. If non-NULL and a vector, k is set to length(mu). If NULL, then the initial value is randomly g

sigma

Starting value of vector of component standard deviations for algorithm. If non-NULL and a scalar, arbvar is set to FALSE. If non-NULL and a vector, arbvar is set to TRUE and k i

Number of components. Initial value ignored unless mu and sigma are both NULL.

mean.constr

Equality constraints on the mean parameters, given as a vector of length k. Each vector entry helps specify the constraints, if any, on the corresponding mean parameter: If NA, the corresponding parameter is unconstrai

sd.constr

Equality constraints on the standard deviation parameters. See mean.constr.

epsilon

The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon.

maxit

The maximum number of iterations.

maxrestarts

The maximum number of restarts allowed in case of a problem with the particular starting values chosen due to one of the variance estimates getting too small (each restart uses randomly chosen starting values). It is well-known that when each

verb

If TRUE, then various updates are printed during each iteration of the algorithm.

fast

If TRUE and k==2 and arbmean==TRUE, then use normalmixEM2comp, which is a much faster version of the EM algorithm for this case. This version is less protected against certain kinds of u

ECM

logical: Should this algorithm be an ECM algorithm in the sense of Meng and Rubin (1993)? If FALSE, the algorithm is a true EM algorithm; if TRUE, then every half-iteration alternately updates the means conditional on the variances or the varia

arbmean

If TRUE, then the component densities are allowed to have different mus. If FALSE, then a scale mixture will be fit. Initial value ignored unless mu is NULL.

arbvar

If TRUE, then the component densities are allowed to have different sigmas. If FALSE, then a location mixture will be fit. Initial value ignored unless sigma is NULL.

Value

normalmixEM returns a list of class mixEM with items:
xThe raw data.
lambdaThe final mixing proportions.
muThe final mean parameters.
sigmaThe final standard deviations. If arbmean = FALSE, then only the smallest standard deviation is returned. See scale below.
scaleIf arbmean = FALSE, then the scale factor for the component standard deviations is returned. Otherwise, this is omitted from the output.
loglikThe final log-likelihood.
posteriorAn nxk matrix of posterior probabilities for observations.
all.loglikA vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length.
restartsThe number of times the algorithm restarted due to unacceptable choice of initial values.
ftA character vector giving the name of the function.

Details

This is the standard EM algorithm for normal mixtures that maximizes the conditional expected complete-data log-likelihood at each M-step of the algorithm. If desired, the EM algorithm may be replaced by an ECM algorithm (see ECM argument) that alternates between maximizing with respect to the mu and lambda while holding sigma fixed, and maximizing with respect to sigma and lambda while holding mu fixed. In the case where arbmean is FALSE and arbvar is TRUE, there is no closed-form EM algorithm, so the ECM option is forced in this case.

References

McLachlan, G. J. and Peel, D. (2000)Finite Mixture Models, John Wiley \& Sons, Inc.
Meng, X.-L. and Rubin, D. B. (1993) Maximum Likelihood Estimation Via the ECM Algorithm: A General Framework,Biometrika80(2): 267-278.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29, 2009.

Examples

Run this code

##Analyzing the Old Faithful geyser data with a 2-component mixture of normals.

data(faithful)
attach(faithful)
set.seed(100)
system.time(out<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03))
out
system.time(out2<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03, fast=TRUE))
out2 # same thing but much faster

Run the code above in your browser using DataLab