KmeansPP: Perform the k-means++ clustering algorithm on a data matrix.
Description
A parallel and scalable implementation of the algorithm described in
Ostrovsky, Rafail, et al. "The effectiveness of Lloyd-type methods for
the k-means problem." Journal of the ACM (JACM) 59.6 (2012): 28.
Data file name on disk (NUMA optimized) or In memory data matrix
centers
The number of centers (i.e., k)
nrow
The number of samples in the dataset
ncol
The number of features in the dataset
nstart
The number of iterations of kmeans++ to run
nthread
The number of parallel threads to run
dist.type
What dissimilarity metric to use c("taxi", "eucl", "cos")
Value
A list containing the attributes of the output.
cluster: A vector of integers (from 1:k) indicating the cluster to
which each point is allocated.
centers: A matrix of cluster centres.
size: The number of points in each cluster.
energy: The sum of distances for each sample from it's closest cluster.
best.start: The sum of distances for each sample from it's closest cluster.
# NOT RUN {iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classesnstart <- 3
km <- KmeansPP(iris.mat, k, nstart=nstart)
# }