MiniBatchKmeans

data

clusters

batch_size

number of times the algorithm will be run with different centroid seeds

num_init

the maximum number of clustering iterations

max_iters

percentage of data to use for the initialization centroids (applies if initializer is kmeans++ or optimal_init). Should be a float number between 0.0 and 1.0.

init_fraction

the method of initialization. One of, optimal_init, quantile_init, kmeans++ and random. See details for more information

initializer

continue that many iterations after calculation of the best within-cluster-sum-of-squared-error

early_stop_iter

either TRUE or FALSE, indicating whether progress is printed during clustering

verbose

a matrix of initial cluster centroids. The rows of the CENTROIDS matrix should be equal to the number of clusters and the columns should be equal to the columns of the data

CENTROIDS

a float number. If, in case of an iteration (iteration &gt; 1 and iteration &lt; max_iters) 'tol' is greater than the squared norm of the centroids, then kmeans has converged

tolerance value for the 'optimal_init' initializer. The higher this value is, the far appart from each other the centroids are.

tol_optimal_init

integer value for random number generator (RNG)

seed

Mini-batch-k-means using RcppArmadillo

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

MiniBatchKmeans: Mini-batch-k-means using RcppArmadillo

Description

Usage

Arguments

Value

Details

References

Examples