cgrestandard: Standardise cluster validation statistics by random clustering results

Description

Standardises cluster validity statistics as produced by clustatsum relative to results that were achieved by random clusterings on the same data by randomclustersim. The aim is to make differences between values comparable between indexes, see Hennig (2017), Akhanli and Hennig (2020).

This is mainly for use within clusterbenchstats.

Usage

cgrestandard(clusum,clusim,G,percentage=FALSE,
                               useallmethods=FALSE,
                             useallg=FALSE, othernc=list())

Arguments

clusum

object of class "valstat", see clusterbenchstats.

clusim

list; output object of randomclustersim, see there.

vector of integers. Numbers of clusters to consider.

percentage

logical. If FALSE, standardisation is done to mean zero and standard deviation 1 using the random clusterings. If TRUE, the output is the percentage of simulated values below the result (more precisely, this number plus one divided by the total plus one).

useallmethods

logical. If FALSE, only random clustering results from clusim are used for standardisation. If TRUE, also clustering results from other methods as given in clusum are used.

useallg

logical. If TRUE, standardisation uses results from all numbers of clusters in G. If FALSE, standardisation of results for a specific number of cluster only uses results from that number of clusters.

othernc

list of integer vectors of length 2. This allows the incorporation of methods that bring forth other numbers of clusters than those in G, for example because a method may have automatically estimated a number of clusters. The first number is the number of the clustering method (the order is determined by argument clustermethod in clusterbenchstats), the second number is the number of clusters. Results specified here are only standardised in useallg=TRUE.

Value

List of class "valstat", see valstat.object, with standardised results as explained above.

Details

cgrestandard will add a statistic named dmode to the input set of validation statistics, which is defined as 0.75*dindex+0.25*highdgap, aggregating these two closely related statistics, see clustatsum.

References

Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Accepted for publication by Statistics and Computing, https://arxiv.org/abs/2002.01822

Examples

Run this code

# NOT RUN {
  
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  dif <- dist(face)
  clusum <- list()
  clusum[[2]] <- list()
  cl12 <- kmeansCBI(face,2)
  cl13 <- kmeansCBI(face,3)
  cl22 <- claraCBI(face,2)
  cl23 <- claraCBI(face,2)
  ccl12 <- clustatsum(dif,cl12$partition)
  ccl13 <- clustatsum(dif,cl13$partition)
  ccl22 <- clustatsum(dif,cl22$partition)
  ccl23 <- clustatsum(dif,cl23$partition)
  clusum[[1]] <- list()
  clusum[[1]][[2]] <- ccl12
  clusum[[1]][[3]] <- ccl13
  clusum[[2]][[2]] <- ccl22
  clusum[[2]][[3]] <- ccl23
  clusum$maxG <- 3
  clusum$minG <- 2
  clusum$method <- c("kmeansCBI","claraCBI")
  clusum$name <- c("kmeansCBI","claraCBI")
  clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1,
    fnruns=1,avenruns=1,monitor=FALSE)
  cgr <- cgrestandard(clusum,clusim,2:3)
  cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE)
  cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE)
  print(str(cgr))
  print(str(cgr2))
  print(cgr3[[1]][[2]])
# }

Run the code above in your browser using DataLab