randIndex: Rand Index

Description

Compute the Rand Index for agreement of two partitions.

Usage

randIndex(x, y, correct=TRUE)
## S3 method for class 'table,missing':
randIndex(x, y, correct=TRUE)
## S3 method for class 'flexclust,flexclust':
randIndex(x, y, correct=TRUE)
## S3 method for class 'integer,integer':
randIndex(x, y, correct=TRUE)
## S3 method for class 'flexclust,integer':
randIndex(x, y, correct=TRUE)
## S3 method for class 'integer,flexclust':
randIndex(x, y, correct=TRUE)

Arguments

Either a 2-dimensional cross-tabulation of cluster assignments, or an object inheriting from class "flexclust", or an integer vector of cluster memberships.

An (optional) object inheriting from class "flexclust", or an integer vector of cluster memberships.

correct

Logical, correct the index for agreement by chance?

Value

A number between -1 and 1 for the corrected version, a number between 0 and 1 for the original version.

Details

Suppose we want to compare two partitions summarized by the contingency table $T=[t_{ij}]$ where $i,j=1,\ldots,K$ and $t_{ij}$ denotes the number of data points which are in cluster $i$ in the first partition and in cluster $j$ in the second partition. Let $A$ denote the number of all pairs of data points which are either put into the same cluster by both partitions or put into different clusters by both partitions. Conversely, let $D$ denote the number of all pairs of data points that are put into one cluster in one partition, but into different clusters by the other partition. The partitions disagree for all pairs $D$ and agree for all pairs $A$. We can measure the agreement by the Rand index $A/(A+D)$ which is invariant with respect to permutations of the columns or rows of $T$.

The index has to be corrected for agreement by chance if the sizes of the clusters are not uniform (which is usually the case), or if there are many clusters, see Hubert & Arabie (1985) for details.

References

Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2, 193--218, 1985.

Examples

Run this code

## no class correlations: corrected Rand almost zero
g1 <- sample(1:5, size=1000, replace=TRUE)
g2 <- sample(1:5, size=1000, replace=TRUE)
tab <- table(g1, g2)
randIndex(tab)

## uncorrected version will be large, because there are many points
## which are assigned to different clusters in both cases
randIndex(tab, correct=FALSE)


## let pairs (g1=1,g2=1) and (g1=3,g2=3) agree better
k <- sample(1:1000, size=200)
g1[k] <- 1
g2[k] <- 1
k <- sample(1:1000, size=200)
g1[k] <- 3
g2[k] <- 3
tab <- table(g1, g2)

## the index should be larger than before
randIndex(tab)
randIndex(tab, correct=FALSE)

Run the code above in your browser using DataLab