cgrestandard: Standardise cluster validation statistics by random clustering results

Description

Standardises cluster validity statistics as produced by clustatsum relative to results that were achieved by random clusterings on the same data by randomclustersim. The aim is to make differences between values comparable between indexes, see Hennig (2019), Akhanli and Hennig (2020).

This is mainly for use within clusterbenchstats.

Usage

cgrestandard(clusum,clusim,G,percentage=FALSE,
                               useallmethods=FALSE,
                             useallg=FALSE, othernc=list())

Value

List of class "valstat", see

valstat.object, with standardised results as explained above.

Arguments

clusum: object of class "valstat", see clusterbenchstats.
clusim: list; output object of randomclustersim, see there.
G: vector of integers. Numbers of clusters to consider.
percentage: logical. If FALSE, standardisation is done to mean zero and standard deviation 1 using the random clusterings. If TRUE, the output is the percentage of simulated values below the result (more precisely, this number plus one divided by the total plus one).
useallmethods: logical. If FALSE, only random clustering results from clusim are used for standardisation. If TRUE, also clustering results from other methods as given in clusum are used.
useallg: logical. If TRUE, standardisation uses results from all numbers of clusters in G. If FALSE, standardisation of results for a specific number of cluster only uses results from that number of clusters.
othernc: list of integer vectors of length 2. This allows the incorporation of methods that bring forth other numbers of clusters than those in G, for example because a method may have automatically estimated a number of clusters. The first number is the number of the clustering method (the order is determined by argument clustermethod in clusterbenchstats), the second number is the number of clusters. Results specified here are only standardised in useallg=TRUE.

Author

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

Details

cgrestandard will add a statistic named dmode to the input set of validation statistics, which is defined as 0.75*dindex+0.25*highdgap, aggregating these two closely related statistics, see clustatsum.

References

Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822

Examples

Run this code

  
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  dif <- dist(face)
  clusum <- list()
  clusum[[2]] <- list()
  cl12 <- kmeansCBI(face,2)
  cl13 <- kmeansCBI(face,3)
  cl22 <- claraCBI(face,2)
  cl23 <- claraCBI(face,2)
  ccl12 <- clustatsum(dif,cl12$partition)
  ccl13 <- clustatsum(dif,cl13$partition)
  ccl22 <- clustatsum(dif,cl22$partition)
  ccl23 <- clustatsum(dif,cl23$partition)
  clusum[[1]] <- list()
  clusum[[1]][[2]] <- ccl12
  clusum[[1]][[3]] <- ccl13
  clusum[[2]][[2]] <- ccl22
  clusum[[2]][[3]] <- ccl23
  clusum$maxG <- 3
  clusum$minG <- 2
  clusum$method <- c("kmeansCBI","claraCBI")
  clusum$name <- c("kmeansCBI","claraCBI")
  clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1,
    fnruns=1,avenruns=1,monitor=FALSE)
  cgr <- cgrestandard(clusum,clusim,2:3)
  cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE)
  cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE)
  print(str(cgr))
  print(str(cgr2))
  print(cgr3[[1]][[2]])

Run the code above in your browser using DataLab