Initialization: Initialization for EM-like Algorithms

Description

These functions implement initialization of EM-like algorithms for model-based clustering based on X.spmd, and initialization of K-means algorithm by randomly picking samples from data based on X.spmd.

*.dmat's are ddmatrix versions.

Usage

initial.RndEM(PARAM)
  initial.em(PARAM, MU = NULL)
  initial.center(PARAM, MU = NULL)
  initial.RndEM.dmat(PARAM)
  initial.em.dmat(PARAM, MU = NULL)
  initial.center.dmat(PARAM, MU = NULL)

Arguments

PARAM

an original set of parameters generated by set.global.

a center matrix with dim = $p \times K$.

Value

The best initial starting points PARAM will be returned among all random starting points. The number of random starting points is assigned by set.global to a list variable CONTROL. See the help page of initial.em and set.global for details.

Details

For initial.RndEM, the procedure is implemented by randomly picking .pmclustEnv$CONTROL$RndEM.iter starting points from data X.spmd and run one E-step to obtain the log likelihood. Then pick the starting point with the highest log likelihood as the best choice to pursue the MLEs in further EM iterations.

This function repeatedly run initial.em by .pmclustEnv$CONTROL$RndEM.iter random starts and pick the best initializations from the random starts.

For initial.em, it takes X.spmd from the global environment and randomly pick $K$ of them as the centers of $K$ groups. If MU is specified, then this MU will be the centers. The default identity dispersion in PARAM$SIGMA will be used. Then, one E-step will be called to obtain the log likelihood and new classification will be updated.

This function is used to implement the RndEM procedure for more elaborate initialization scheme in initial.RndEM. Potentially, several random starts should be tried before running EM algorithms. This can benefit in two aspects including: shorter convergent iterations and better classification results.

For initial.center, if MU is given, then the center will be assigned according.

References

Programming with Big Data in R Website: http://r-pbd.org/

Maitra, R. (2009) “Initializing partition-optimization algorithms”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6:1, 114-157.

Examples

Run this code

# NOT RUN {
# Examples can be found in the help page of em.step(),
# aecm.step(), apecm.step(), apecma.step(), and kmeans.step().

# Examples for ddmatrix version can be found in the help pages of
# kmeans.step.dmat().
# }

Run the code above in your browser using DataLab