This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids (PAM).
nhclu_pam(
dissimilarity,
index = names(dissimilarity)[3],
seed = NULL,
n_clust = c(1, 2, 3),
variant = "faster",
nstart = 1,
cluster_only = FALSE,
algorithm_in_output = TRUE,
...
)A list of class bioregion.clusters with five components:
name: A character string containing the name of the algorithm.
args: A list of input arguments as provided by the user.
inputs: A list of characteristics of the clustering process.
algorithm: A list of all objects associated with the clustering
procedure, such as original cluster objects (only if
algorithm_in_output = TRUE).
clusters: A data.frame containing the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
pam.
The output object from dissimilarity() or
similarity_to_dissimilarity(), or a dist object. If a data.frame is
used, the first two columns should represent pairs of sites (or any pair of
nodes), and the subsequent column(s) should contain the dissimilarity indices.
The name or number of the dissimilarity column to use. By
default, the third column name of dissimilarity is used.
A value for the random number generator (NULL for random by
default).
An integer vector or a single integer value specifying
the requested number(s) of clusters.
A character string specifying the PAM variant to use.
Defaults to faster. Available options are original, o_1, o_2, f_3,
f_4, f_5, or faster. See pam for more details.
An integer specifying the number of random starts for the PAM
algorithm. Defaults to 1 (for the faster variant).
A boolean specifying whether only the clustering
results should be returned from the pam function. Setting
this to TRUE makes the function more efficient.
A boolean indicating whether the original output
of pam should be included in the result. Defaults to TRUE
(see Value).
Additional arguments to pass to pam() (see pam).
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
This method partitions the data into the chosen number of clusters based on the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarities between cluster centers (medoids) and points assigned to the cluster. In contrast, k-means minimizes the sum of squared Euclidean distances, which makes it unsuitable for dissimilarity matrices that are not based on Euclidean distances.
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")
Run the code above in your browser using DataLab