best.agreement: Determine agreement of two classifications

Description

Distinct classifications will have class labels that may prevent straightforward comparisons. This algorithm considers all possible permutations of class labels to find a configuration that maximizes agreement on the diagonal of a contingency table comparing two classifications. Classifications need not have the same number of classes.

Usage

best.agreement(class1, class2, test = FALSE, rperm = 100)

Arguments

class1

A vector containing class assignments to observations, e.g., a result from cutree

class2

A vector containing class assignments for a second classification

test

Logical. Indicates whether or not the null hypothesis, that agreement between class1 and class2 is no better than random, will be run.

rperm

If test = TRUE, the number of random permutations used in null hypothesis testing.

Value

n.possible.perms

Number of permutations considered

n.max.solutions

Number of configurations in which classification agreement is maximized. The first configuration identified is reported in max.class.names1 and max.class.names2.

max.agree

Proportion of observations assigned to the same cluster

max.class.names1

Class labels in the first classification that allow maximum agreement.

max.class.names2

Class labels in the second classification that allow maximum agreement.

p.val

If test = TRUE, the p-value for the null hypothesis test described in Details above.

Details

Class assignments are fixed in class1, all possible permutations of class labels in class2 are considered to find a configuration that maximizes agreement in the two classifications. If test=TRUE, a permutation test is run for the null hypothesis that maximum agreement between classifications is no better than random. This is done by sampling without replacement rperm times from class2, finding maximum agreement between class1 and the randomly permuted classifications, and dividing one plus the number of times that maximum agreement between the random classifications and class1 was greater than the maximum agreement observed for class1 and class2. Testing can be slow because it will be based on nested loops with $p x c!$ steps, where p is nperm and c! is the number of combinatorial permutations possible for categories in class2.

Examples

Run this code

# NOT RUN {
# Example comparing a 7 cluster average-linkage solution 
# and a 6 cluster Ward-linakage solution 

avg <- hclust(dist(USArrests), "ave")
avg.7 <- as.matrix(cutree(avg, k = 7))
war <- hclust(dist(USArrests), "ward.D")
war.6 <- as.matrix(cutree(war, k = 6))

best.agreement(avg.7, war.6)
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning