rand_indep: Computes the Rand similarity index of two clusterings of the same data set under the assumption that the two clusterings are independent.

Description

For two clusterings of the same data set, this function calculates the Rand similarity coefficient of the clusterings from the comemberships of the observations. Basically, the comembership is defined as the pairs of observations that are clustered together.

Usage

rand_indep(labels1, labels2)

Arguments

Value

the Rand index for the two sets of cluster labels

Details

To calculate the Rand index, we compute the 2x2 contingency table, consisting of the following four cells: [object Object],[object Object],[object Object],[object Object]

The Rand similarity index is defined as: $$R = \frac{n_{11} + n_{00}}{n_{11} + n_{10} + n_{01} + n_{00}}$$.

To compute the contingency table, we use the comembership_table function.

Examples

Run this code

# We generate K = 3 labels for each of n = 10 observations and compute the
# Rand similarity index between the two clusterings.
set.seed(42)
K <- 3
n <- 10
labels1 <- sample.int(K, n, replace = TRUE)
labels2 <- sample.int(K, n, replace = TRUE)
rand_indep(labels1, labels2)

# Here, we cluster the \code{\link{iris}} data set with the K-means and
# hierarchical algorithms using the true number of clusters, K = 3.
# Then, we compute the Rand similarity index between the two clusterings.
iris_kmeans <- kmeans(iris[, -5], centers = 3)$cluster
iris_hclust <- cutree(hclust(dist(iris[, -5])), k = 3)
rand_indep(iris_kmeans, iris_hclust)

Run the code above in your browser using DataLab