Clustering Large Applications (CLARA) with the improvements, to increase scalability in the number of clusters. This variant will also default to twice the sample size, to improve quality. (Schubert and Rousseeuw, 2019)
fastclara(
rdist,
n,
k,
maxiter = 0L,
initializer = "LAB",
fasttol = 1,
numsamples = 5L,
sampling = 0.25,
independent = FALSE,
seed = 123456789L
)
The distance matrix (lower triangular matrix, column wise storage)
The number of observations
The number of clusters to produce
The maximum number of iterations (default: 0)
Initializer: either "BUILD" (used in classic PAM) or "LAB" (linear approximative BUILD)
Tolerance for fast swapping behavior (may perform worse swaps). Default: 1.0, which means to perform any additional swap that gives an improvement. When set to 0, it will only execute an additional swap if it appears to be independent (i.e., the improvements resulting from the swap have not decreased when the first swap was executed).
Number of samples to draw (i.e. iterations). Default: 5
Sampling rate. Default value: 80 + 4*k. (see Schubert and Rousseeuw, 2019) If less than 1, it is considered to be a relative value. e.g. N*0.10
NOT Keep the previous medoids in the next sample. Default: FALSE
Seed for random number generator. Default: 123456789
KMedoids S4 class
Erich Schubert, Peter J. Rousseeuw "Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms" 2019 https://arxiv.org/abs/1810.05691