Learn R Programming

COMMUNAL (version 1.1.0)

clusterKeys: Rekey cluster assignments.

Description

Reindexes (rekeys) the cluster assignments to maximize overlap across algorithms. Ignores algorithms which could not find k clusters; i.e. when one of the clusters is smaller than the min.size argument. Use this after determining the optimal number of clusters (via plotRange3D).

Usage

clusterKeys(clusters, min.size = 3)

Arguments

clusters
Data frame of cluster assignments, where rows are samples, columns are algorithms, assignments are integers. For example, the output of the getClustering method in "COMMUNAL".
min.size
Minimum cluster size. Algorithms that return clusters smaller than this (or that don't have k clusters) are tossed out.

Value

Returns a matrix of rekeyed cluster assignments, such that cluster 'n' refers to the same cluster across all algorithms. Cluster 0 contains the samples for which no consistent 'core' cluster could be identified.

Examples

Run this code
# reindexes cluster numbers to agree
clusters <- data.frame(
  alg1=as.integer(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)),
  alg2=as.integer(c(1,1,1,1,1,3,3,3,3,3,2,2,2,2,2)),
  alg3=as.integer(c(3,3,3,3,3,1,1,1,1,1,2,2,2,2,2))
)
mat.key <- clusterKeys(clusters)
mat.key # cluster indices are relabeled
examineCounts(mat.key)
core <- returnCore(mat.key, agreement.thresh=50) # find 'core' clusters
table(core) # the 'core' clusters

# some clusters assignments are undetermined
clusters <- data.frame(
  alg1=as.integer(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,1,1,2,2,3,3)),
  alg2=as.integer(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,1,2,2,3,3,1)),
  alg3=as.integer(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,2,3,1,1,2,3))
)
mat.key <- clusterKeys(clusters)
mat.key # last six samples have conflicting assignments
examineCounts(mat.key)
core <- returnCore(mat.key, agreement.thresh=66) # at least 2 of 3 algs must agree
table(core)
core <- returnCore(mat.key, agreement.thresh=99) # all algs must agree
table(core)

Run the code above in your browser using DataLab