fastclara: FastCLARA

Description

Clustering Large Applications (CLARA) with the improvements, to increase scalability in the number of clusters. This variant will also default to twice the sample size, to improve quality. (Schubert and Rousseeuw, 2019)

Usage

fastclara(
  rdist,
  n,
  k,
  maxiter = 0L,
  initializer = "LAB",
  fasttol = 1,
  numsamples = 5L,
  sampling = 0.25,
  independent = FALSE,
  seed = 123456789L
)

Value

KMedoids S4 class

Arguments

rdist: The distance matrix (lower triangular matrix, column wise storage)
n: The number of observations
k: The number of clusters to produce
maxiter: The maximum number of iterations (default: 0)
initializer: Initializer: either "BUILD" (used in classic PAM) or "LAB" (linear approximative BUILD)
fasttol: Tolerance for fast swapping behavior (may perform worse swaps). Default: 1.0, which means to perform any additional swap that gives an improvement. When set to 0, it will only execute an additional swap if it appears to be independent (i.e., the improvements resulting from the swap have not decreased when the first swap was executed).
numsamples: Number of samples to draw (i.e. iterations). Default: 5
sampling: Sampling rate. Default value: 80 + 4*k. (see Schubert and Rousseeuw, 2019) If less than 1, it is considered to be a relative value. e.g. N*0.10
independent: NOT Keep the previous medoids in the next sample. Default: FALSE
seed: Seed for random number generator. Default: 123456789

References

Erich Schubert, Peter J. Rousseeuw "Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms" 2019 https://arxiv.org/abs/1810.05691