# cluster_similarity: Computes the similarity between two clusterings of the same data set.

## Description

For two clusterings of the same data set, this function
calculates the similarity statistic specified of the
clusterings from the comemberships of the observations.
Basically, the comembership is defined as the pairs of
observations that are clustered together.
## Usage

cluster_similarity(labels1, labels2, similarity = c("jaccard", "rand"), method = "independence")

## Arguments

labels1

a vector of `n`

clustering labels

labels2

a vector of `n`

clustering labels

similarity

the similarity statistic to calculate

method

the model under which the statistic was
derived

## Value

the similarity between the two clusterings

## Details

To calculate the similarity, we compute the 2x2
contingency table, consisting of the following four
cells:
- n_11
- the number of observation
pairs where both observations are comembers in both
clusterings
- n_10
- the number of observation pairs
where the observations are comembers in the first
clustering but not the second
- n_01
- the number of
observation pairs where the observations are comembers in
the second clustering but not the first
- n_00
- the
number of observation pairs where neither pair are
comembers in either clustering

Currently, we have implemented the following similarity
statistics:

- Rand index
- Jaccard
coefficient

To compute the contingency table, we use the
`comembership_table`

function.

## Examples

# Notice that the number of comemberships is 'n choose 2'.
iris_kmeans <- kmeans(iris[, -5], centers = 3)$cluster
iris_hclust <- cutree(hclust(dist(iris[, -5])), k = 3)
cluster_similarity(iris_kmeans, iris_hclust)