- df
a data frame or a matrix. Each row is a single observation and each column is a dimension.
the first column can contain id for each observation (if id_column is TRUE),
otherwise the rownames are used.
- k
number of clusters. Note that in some cases the algorithm might return less clusters than k.
- metric
distance metric for kmeans++ seeding. can be 'euclid', 'pearson' or 'spearman'
- max_iter
maximal number of iterations
- min_delta
minimal change in assignments (fraction out of all observations) to continue iterating
- verbose
display algorithm messages
- keep_log
keep algorithm messages in 'log' field
- id_column
df's first column contains the observation id
- reorder_func
function to reorder the clusters. operates on each center and orders by the result. e.g. reorder_func = mean would calculate the mean of each center and then would reorder the clusters accordingly. If reorder_func = hclust the centers would be ordered by hclust of the euclidean distance of the correlation matrix, i.e. hclust(dist(cor(t(centers))))
if NULL, no reordering would be done.
- hclust_intra_clusters
run hierarchical clustering within each cluster and return an ordering of the observations.
- seed
seed for the c++ random number generator
- parallel
cluster every cluster parallelly (if hclust_intra_clusters is true)
- use_cpp_random
use c++ random number generator instead of R's. This should be used for only for
backwards compatibility, as from version 0.4.0 onwards the default random number generator was changed o R.