llnorMix: Likelihood, Parametrization and EM-Steps For 1D Normal Mixtures

Description

These functions work with an almost unconstrained parametrization of univariate normal mixtures.

llnorMix(p, *): computes the log likelihood,
obj <- par2norMix(p): maps parameter vector p to a norMix object obj,
p <- nM2par(obj): maps from a norMix object obj to parameter vector p,

where p is always a parameter vector in our parametrization.

Partly for didactical reasons, the following functions provide the basic ingredients for the EM algorithm (see also norMixEM) to parameter estimation:

estep.nm(x, obj, p): computes 1 E-step for the data x, given either a "norMix" object obj or parameter vector p.
mstep.nm(x, z): computes 1 M-step for the data x and the probability matrix z.
emstep.nm(x, obj): computes 1 E- and 1 M-step for the data x and the "norMix" object obj.

where again, p is a parameter vector in our parametrization, x is the (univariate) data, and z is a $n \times m$ matrix of (posterior) conditional probabilities, and $\theta$ is the full parameter vector of the mixture model.

Usage

llnorMix(p, x, m = (length(p) + 1)/3)
par2norMix(p, name = sprintf("{from %s}", deparse(substitute(p))[1]))
nM2par(obj)
 estep.nm(x, obj, par)
 mstep.nm(x, z)
emstep.nm(x, obj)

Arguments

p, par

numeric vector: our parametrization of a univariate normal mixture, see details.

numeric: the data for which the likelihood is to be computed.

integer number of mixture components; this is not to be changed for a given p.

name

(for par2norMix():) a name for the "norMix" object that is returned.

obj

a "norMix" object, see norMix.

a $n \times m$ matrix of (posterior) conditional probabilities, $z_{ij}= P(x_i \in C_j | \theta)$, where $C_j$ denotes the $j$-th group (“cluster”).

Value

llnorMix() returns a number, namely the log-likelihood.

par2norMix() returns "norMix" object, see norMix.

nM2par() returns the parameter vector $\theta$ of length $3k-1$.

estep.nm() returns z, the matrix of (conditional) probabilities.

mstep.nm() returns the model parameters as a list with components w, mu, and sigma, corresponding to the arguments of norMix(). (and see the 'Examples' on using do.call(norMix, *) with it.)

emstep.nm() returns an updated "norMix" object.

Details

We use a parametrization of a (finite) univariate normal mixture which is particularly apt for likelihood maximization, namely, one whose parameter space is almost a full $\mathbf{I\hskip-0.22em R}^m$, $m = 3k-1$.

For a $k$-component mixture, we map to and from a parameter vector $\theta$ (== p as R-vector) of length $3k-1$. For mixture density $$\sum_{j=1}^k \pi_j \phi((t - \mu_j)/\sigma_j)/\sigma_j,$$ we logit-transform the $\pi_j$ (for $j \ge 2$) and log-transform the $\sigma_j$, such that $\theta$ is partitioned into

p[ 1:(k-1)]:: p[j]$= \mbox{logit}(\pi_{j+1})$ and $\pi_1$ is given implicitly as $\pi_1 = 1-\sum_{j=2}^k \pi_j$.
p[ k:(2k-1)]:: p[k-1+ j]$= \mu_j$, for j=1:k.
p[2k:(3k-1)]:: p[2*k-1+ j] $= \log(\sigma_j)$, i.e., $\sigma_j^2 = exp(2*p[.+j])$.

Examples

Run this code

# NOT RUN {
(obj <- MW.nm10) # "the Claw" -- m = 6 components
length(pp <- nM2par(obj)) # 17 == (3*6) - 1
par2norMix(pp)
## really the same as the initial 'obj' above

## Log likelihood (of very artificial data):
llnorMix(pp, x = seq(-2, 2, length=1000))
set.seed(47)## of more realistic data:
x <- rnorMix(1000, obj)
llnorMix(pp, x)

## Consistency check :  nM2par()  and  par2norMix()  are inverses
all.EQ <- function(x,y, tol = 1e-15, ...) all.equal(x,y, tolerance=tol, ...)
stopifnot(all.EQ(pp, nM2par(par2norMix(pp))),
          all.EQ(obj, par2norMix(nM2par(obj)),
                    check.attributes=FALSE),
          ## Direct computation of log-likelihood:
          all.EQ(sum(dnorMix(x, obj, log=TRUE)),
                    llnorMix(pp, x))  )

## E- and M- steps : ------------------------------
rE1 <- estep.nm(x, obj)
rE2 <- estep.nm(x, par=pp) # the same as rE1
z <- rE1
str( rM <-  mstep.nm(x, z))
   (rEM <- emstep.nm(x, obj))

stopifnot(all.EQ(rE1, rE2),
          all.EQ(rEM, do.call(norMix, c(rM, name=""))))
# }

Run the code above in your browser using DataLab