pmclust-package: Parallel Model-Based Clustering

Description

The pmclust aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The package employs Rmpi to perform a parallel version of expectation and maximization (EM) algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data (SPMD) programming model. The code can be executed through Rmpi and independent to most MPI applications. See the High Performance Statistical Computing (HPSC) website for more information, documents and examples.

Arguments

Details

ll{ Package: pmclust Type: Package License: GPL LazyLoad: yes }

The main function is em.step.spmd implementing the parallel EM algorithm for mixture multivariate Gaussian models with unstructured dispersions. This function groups a data matrix X.spmd into K clusters where X.spmd is potentially huge and taken from the global environment .GlobalEnv.

Other main functions aecm.step.spmd, apecm1.step.spmd, and apecm2.step.spmd may provide better performance than the em.step.spmd in terms of computing time and convergent iterations.

kmeans.step.spmd provides the fastest clustering among above algorithms, but it is restricted by Euclidean distance and spherical dispersions.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Chen, W.-C. and Maitra, R. (2011) Model-based clustering of regression time series data via APECM -- an AECM algorithm sung to an even faster beat, Statistical Analysis and Data Mining, 4, 567-578.

Chen, W.-C. and Ostrouchov, G. (2012) Parallel Model-Based Clustering for Finite Mixture Gaussian Models, (in preparation).

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

Lloyd., S. P. (1982) Least squares quantization in PCM, IEEE Transactions on Information Theory, 28, 129-137.

Meng, X.-L. and Van Dyk, D. (1997) The EM Algorithm -- an Old Folk-song Sung to a Fast New Tune, Journal of the Royal Statistical Society Series B, 59, 511-567.

Examples

Run this code

# Under command mode, run the demo in 4 processors by
mpirun -np 4 Rscript -e \
  'demo(ex_em, package = "pmclust", ask = F, echo = F)'
mpirun -np 4 Rscript -e \
  'demo(ex_aecm, package = "pmclust", ask = F, echo = F)'
mpirun -np 4 Rscript -e \
  'demo(ex_apecm1, package = "pmclust", ask = F, echo = F)'
mpirun -np 4 Rscript -e \
  'demo(ex_apecm2, package = "pmclust", ask = F, echo = F)'
mpirun -np 4 Rscript -e \
  'demo(ex_kmeans, package = "pmclust", ask = F, echo = F)'

Run the code above in your browser using DataLab

Description

Arguments

Details

References

See Also

Examples