norMixFit: EM and MLE Estimation of Univariate Normal Mixtures

Description

These functions estimate the parameters of a univariate (finite) normal mixture using the EM algorithm or Likelihood Maximimization via optim(.., method = "BFGS").

Usage

norMixEM(x, m, name = NULL, sd.min = 1e-07* diff(range(x))/m,
         trafo = c("clr1", "logit"),
         maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 1)
norMixMLE(x, m, name = NULL, 
          trafo = c("clr1", "logit"),
          maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 2)

Value

An object of class norMix.

Arguments

x: numeric: the data for which the parameters are to be estimated.
m: integer or factor: If m has length 1 it specifies the number of mixture components, otherwise it is taken to be a vector of initial cluster assignments, see details below.
name: character, passed to norMix. The default, NULL, uses match.call().
sd.min: number: the minimal value that the normal components' standard deviations (sd) are allowed to take. A warning is printed if some of the final sd's are this boundary.
trafo: character string specifying the transformation of the component weight w \(m\)-vector (mathematical notation in norMix: \(\pi_j, j=1,\dots,m\)) to an \((m-1)\)-dimensional unconstrained parameter vector in our parametrization. See nM2par for details.
maxiter: integer: maximum number of EM iterations.
tol: numeric: EM iterations stop if relative changes of the log-likelihood are smaller than tol.
trace: integer (or logical) specifying if the iterations should be traced and how much output should be produced. The default, 1 prints a final one line summary, where trace = 2 produces one line of output per iteration.

Author

EM: Friedrich Leisch, originally; Martin Maechler vectorized it in \(m\), added trace etc.

MLE: M.Maechler

Details

Estimation of univariate mixtures can be very sensitive to initialization. By default, norMixEM and norMixLME cut the data into \(m\) groups of approximately equal size. See examples below for other initialization possibilities.

The EM algorithm consists in repeated application of E- and M- steps until convergence. Mainly for didactical reasons, we also provide the functions estep.nm, mstep.nm, and emstep.nm.

The MLE, Maximum Likelihood Estimator, maximizes the likelihood using optim, using the same advantageous parametrization as llnorMix.

Examples

Run this code

ex <- norMix(mu = c(-1,2,5), sig2 = c(1, 0.5, 3))
plot(ex, col="gray", p.norm=FALSE)

x <- rnorMix(100, ex)
lines(density(x))
rug(x)

## EM estimation may fail depending on random sample
ex1 <- norMixEM(x, 3, trace=2) #-> warning (sometimes)
ex1
plot(ex1)

## initialization by cut() into intervals of equal length:
ex2 <- norMixEM(x, cut(x, 3))
ex2

## initialization by kmeans():
k3 <- kmeans(x, 3)$cluster
ex3 <- norMixEM(x, k3)
ex3

## Now, MLE instead of EM:
exM <- norMixMLE(x, k3, tol = 1e-12, trace=4)
exM

## real data
data(faithful)
plot(density(faithful$waiting, bw = "SJ"), ylim=c(0,0.044))
rug(faithful$waiting)

(nmF <- norMixEM(faithful$waiting, 2))
lines(nmF, col=2)
## are three components better?
nmF3 <- norMixEM(faithful$waiting, 3, maxiter = 200)
lines(nmF3, col="forestgreen")

Run the code above in your browser using DataLab