cluster_similarity
From clusteval v0.1
by John Ramey
Computes the similarity between two clusterings of the same data set.
For two clusterings of the same data set, this function calculates the similarity statistic specified of the clusterings from the comemberships of the observations. Basically, the comembership is defined as the pairs of observations that are clustered together.
Usage
cluster_similarity(labels1, labels2, similarity = c("jaccard", "rand"), method = "independence")
Arguments
 labels1
 a vector of
n
clustering labels  labels2
 a vector of
n
clustering labels  similarity
 the similarity statistic to calculate
 method
 the model under which the statistic was derived
Details
To calculate the similarity, we compute the 2x2 contingency table, consisting of the following four cells:
 n_11
 the number of observation pairs where both observations are comembers in both clusterings
 n_10
 the number of observation pairs where the observations are comembers in the first clustering but not the second
 n_01
 the number of observation pairs where the observations are comembers in the second clustering but not the first
 n_00
 the number of observation pairs where neither pair are comembers in either clustering
Currently, we have implemented the following similarity statistics:
 Rand index
 Jaccard coefficient
To compute the contingency table, we use the
comembership_table
function.
Value

the similarity between the two clusterings
Examples
# Notice that the number of comemberships is 'n choose 2'.
iris_kmeans < kmeans(iris[, 5], centers = 3)$cluster
iris_hclust < cutree(hclust(dist(iris[, 5])), k = 3)
cluster_similarity(iris_kmeans, iris_hclust)
Community examples
Looks like there are no examples yet.