classAgreement()
computes several coefficents of agreement
between the columns and rows of a 2-way contingency table.classAgreement(tab, match.names=FALSE)
tab
.diag
corrected for agreement by chance.match.names
is TRUE
, the class labels
as given by the row and column names are matched, i.e. only columns and
rows with the same dimnames are used for the computation.If the two classification do not use the same set of labels, or if identical labels can have different meaning (e.g., two outcomes of cluster analysis on the same data set), then the situation is a little bit more complicated. Let $A$ denote the number of all pairs of data points which are either put into the same cluster by both partitions or put into different clusters by both partitions. Conversely, let $D$ denote the number of all pairs of data points that are put into one cluster in one partition, but into different clusters by the other partition. Hence, the partitions disagree for all pairs $D$ and agree for all pairs $A$. We can measure the agreement by the Rand index $A/(A+D)$ which is invariant with respect to permutations of the columns or rows of $T$.
Both indices have to be corrected for agreement by chance if the sizes of the classes are not uniform.
Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2, 193--218, 1985.
matchClasses
## no class correlations: both kappa and crand almost zero
g1 <- sample(1:5, size=1000, replace=TRUE)
g2 <- sample(1:5, size=1000, replace=TRUE)
tab <- table(g1, g2)
classAgreement(tab)
## let pairs (g1=1,g2=1) and (g1=3,g2=3) agree better
k <- sample(1:1000, size=200)
g1[k] <- 1
g2[k] <- 1
k <- sample(1:1000, size=200)
g1[k] <- 3
g2[k] <- 3
tab <- table(g1, g2)
## both kappa and crand should be significantly larger than before
classAgreement(tab)
Run the code above in your browser using DataLab