KMeans_rcpp(data, clusters, num_init = 1, max_iters = 100, initializer = "optimal_init", fuzzy = FALSE, threads = 1, verbose = FALSE, CENTROIDS = NULL, tol = 1e-04, tol_optimal_init = 0.5, seed = 1)
It allows for multiple initializations (which can be parallelized if Openmp is available).
Besides optimal_init, quantile_init, random and kmeans++ initilizations one can specify the centroids using the CENTROIDS parameter.
The running time and convergence of the algorithm can be adjusted using the num_init, max_iters and tol parameters.
If num_init > 1 then KMeans_rcpp returns the attributes of the best initialization using as criterion the within-cluster-sum-of-squared-error.
---------------initializers----------------------
optimal_init : this initializer adds rows of the data incrementally, while checking that they do not already exist in the centroid-matrix
quantile_init : initialization of centroids by using the cummulative distance between observations and by removing potential duplicates
kmeans++ : kmeans++ initialization. Reference : http://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf AND http://stackoverflow.com/questions/5466323/how-exactly-does-k-means-work
random : random selection of data rows as initial centroids
data(dietary_survey_IBS)
dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]
dat = center_scale(dat)
km = KMeans_rcpp(dat, clusters = 2, num_init = 5, max_iters = 100, initializer = 'optimal_init')
Run the code above in your browser using DataLab