Clara_Medoids

data

clusters

number of samples to draw from the data set

samples

fraction of data to draw in each sample iteration. It should be a float number greater than 0.0 and less or equal to 1.0

sample_size

a string specifying the distance method. One of, euclidean, manhattan, chebyshev, canberra, braycurtis, pearson_correlation, simple_matching_coefficient, minkowski, hamming, jaccard_coefficient, Rao_coefficient, mahalanobis, cosine

distance_metric

a numeric value specifying the minkowski parameter in case that distance_metric = "minkowski"

minkowski_p

an integer specifying the number of cores to run in parallel. Openmp will be utilized to parallelize the number of the different sample draws

threads

either TRUE or FALSE. If TRUE then both phases ('build' and 'swap') will take place. The 'swap_phase' is considered more computationally intensive.

swap_phase

either TRUE or FALSE. If TRUE, then probabilities for each cluster will be returned based on the distance between observations and medoids

fuzzy

either TRUE or FALSE, indicating whether progress is printed during clustering

verbose

integer value for random number generator (RNG)

seed

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

Clara_Medoids: Clustering large applications

Description

Usage

Arguments

Value

Details

References

Examples