mixtools (version 1.0.4)

npEM: Nonparametric EM-like Algorithm for Mixtures of Independent Repeated Measurements

Description

Returns nonparametric EM algorithm output (Benaglia et al, 2009) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn.

Usage

npEM(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, h = bw, eps = 1e-8, maxiter = 500, stochastic = FALSE, verb = TRUE)

Arguments

x
An $n x r$ matrix of data. Each of the $n$ rows is a case, and each case has $r$ repeated measurements. These measurements are assumed to be conditionally independent, conditional on the mixture component (subpopulation) from which the case is drawn.
mu0
Either an $m x r$ matrix specifying the initial centers for the kmeans function, or an integer $m$ specifying the number of initial centers, which are then choosen randomly in kmeans
blockid
A vector of length $r$ identifying coordinates (columns of x) that are assumed to be identically distributed (i.e., in the same block). For instance, the default has all distinct elements, indicating that no two coordinates are assumed identically distributed and thus a separate set of $m$ density estimates is produced for each column of $x$. On the other hand, if blockid=rep(1,ncol(x)), then the coordinates in each row are assumed conditionally i.i.d.
bw
Bandwidth for density estimation, equal to the standard deviation of the kernel density. By default, a simplistic application of the default bw.nrd0 bandwidth used by density to the entire dataset.
samebw
Logical: If TRUE, use the same bandwidth for each iteration and for each component and block. If FALSE, use a separate bandwidth for each component and block, and update this bandwidth at each iteration of the algorithm using a suitably modified bw.nrd0 method as described in Benaglia et al (2011).
h
Alternative way to specify the bandwidth, to provide backward compatibility.
eps
Tolerance limit for declaring algorithm convergence. Convergence is declared whenever the maximum change in any coordinate of the lambda vector (of mixing proportion estimates) does not exceed eps.
maxiter
The maximum number of iterations allowed, for both stochastic and non-stochastic versions; for non-stochastic algorithms (stochastic = FALSE), convergence may be declared before maxiter iterations (see eps above).
stochastic
Flag, if FALSE (the default), runs the non-stochastic version of the npEM algorithm, as in Benaglia et al (2009). Set to TRUE to run a stochastic version which simulates the posteriors at each iteration, and runs for maxiter iterations.
verb
If TRUE, print updates for every iteration of the algorithm as it runs

Value

npEM returns a list of class npEM with the following items:
data
The raw data (an $n x r$ matrix).
posteriors
An $n x m$ matrix of posterior probabilities for observation. If stochastic = TRUE, this matrix is computed from an average over the maxiter iterations.
bandwidth
If samebw==TRUE, same as the bw input argument; otherwise, value of bw matrix at final iteration. This information is needed by any method that produces density estimates from the output.
blockid
Same as the blockid input argument, but recoded to have positive integer values. Also needed by any method that produces density estimates from the output.
lambda
The sequence of mixing proportions over iterations.
lambdahat
The final mixing proportions if stochastic = FALSE, or the average mixing proportions if stochastic = TRUE.
loglik
The sequence of log-likelihoods over iterations.

References

  • Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
  • Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009), mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29.

  • Benaglia, T., Chauveau, D. and Hunter, D.R. (2011), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27.

  • Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.

See Also

plot.npEM, normmixrm.sim, spEMsymloc, spEM, plotseq.npEM

Examples

Run this code
## Examine and plot water-level task data set.

## First, try a 3-component solution where no two coordinates are
## assumed i.d.
data(Waterdata)
set.seed(100)
## Not run: 
# a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid
# plot(a) # This produces 8 plots, one for each coordinate
# ## End(Not run)

## Next, same thing but pairing clock angles that are directly opposite one
## another (1:00 with 7:00, 2:00 with 8:00, etc.)
## Not run: 
# b <- npEM(Waterdata[,3:10], mu0=3, blockid=c(4,3,2,1,3,4,1,2), bw=4) # iid in pairs
# plot(b) # Now only 4 plots, one for each block
# ## End(Not run)

Run the code above in your browser using DataLab