data

the number of gaussian mixture components

gaussian_comps

the distance used during the seeding of initial means and k-means clustering. One of, eucl_dist, maha_dist.

dist_mode

how the initial means are seeded prior to running k-means and/or EM algorithms. One of, static_subset, random_subset, static_spread, random_spread.

seed_mode

the number of iterations of the k-means algorithm

km_iter

the number of iterations of the EM algorithm

em_iter

either TRUE or FALSE; enable or disable printing of progress during the k-means and EM algorithms

verbose

the variance floor (smallest allowed value) for the diagonal covariances

var_floor

integer value for random number generator (RNG)

seed

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

GMM: Gaussian Mixture Model clustering

Description

Usage

Arguments

Value

Details

References

Examples