
Last chance! 50% off unlimited learning
Sale ends in
Standardises cluster validity statistics as produced by
clustatsum
relative to results that were achieved by
random clusterings on the same data by
randomclustersim
. The aim is to make differences between
values comparable between indexes, see Hennig (2017), Akhanli and
Hennig (2020).
This is mainly for use within clusterbenchstats
.
cgrestandard(clusum,clusim,G,percentage=FALSE,
useallmethods=FALSE,
useallg=FALSE, othernc=list())
object of class "valstat", see clusterbenchstats
.
list; output object of randomclustersim
,
see there.
vector of integers. Numbers of clusters to consider.
logical. If FALSE
, standardisation is done to
mean zero and standard deviation 1 using the random clusterings. If
TRUE
, the output is the percentage of simulated values below
the result (more precisely, this number plus one divided by the
total plus one).
logical. If FALSE
, only random clustering
results from clusim
are used for standardisation. If
TRUE
, also clustering results from other methods as given in
clusum
are used.
logical. If TRUE
, standardisation uses results
from all numbers of clusters in G
. If FALSE
,
standardisation of results for a specific number of cluster only
uses results from that number of clusters.
list of integer vectors of length 2. This allows the
incorporation of methods that bring forth other numbers of clusters
than those in G
, for example because a method may have
automatically estimated a number of clusters. The first number is
the number of the clustering method (the order is determined by
argument clustermethod
in
clusterbenchstats
), the second number is the
number of clusters. Results specified here are only standardised in
useallg=TRUE
.
List of class "valstat"
, see
valstat.object
, with standardised results as
explained above.
cgrestandard
will add a statistic named dmode
to the
input set of validation statistics, which is defined as
0.75*dindex+0.25*highdgap
, aggregating these two closely
related statistics, see clustatsum
.
Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Accepted for publication by Statistics and Computing, https://arxiv.org/abs/2002.01822
valstat.object
, clusterbenchstats
, stupidkcentroids
, stupidknn
, \codestupidkfn, stupidkaven
, codeclustatsum
# NOT RUN {
set.seed(20000)
options(digits=3)
face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
dif <- dist(face)
clusum <- list()
clusum[[2]] <- list()
cl12 <- kmeansCBI(face,2)
cl13 <- kmeansCBI(face,3)
cl22 <- claraCBI(face,2)
cl23 <- claraCBI(face,2)
ccl12 <- clustatsum(dif,cl12$partition)
ccl13 <- clustatsum(dif,cl13$partition)
ccl22 <- clustatsum(dif,cl22$partition)
ccl23 <- clustatsum(dif,cl23$partition)
clusum[[1]] <- list()
clusum[[1]][[2]] <- ccl12
clusum[[1]][[3]] <- ccl13
clusum[[2]][[2]] <- ccl22
clusum[[2]][[3]] <- ccl23
clusum$maxG <- 3
clusum$minG <- 2
clusum$method <- c("kmeansCBI","claraCBI")
clusum$name <- c("kmeansCBI","claraCBI")
clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1,
fnruns=1,avenruns=1,monitor=FALSE)
cgr <- cgrestandard(clusum,clusim,2:3)
cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE)
cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE)
print(str(cgr))
print(str(cgr2))
print(cgr3[[1]][[2]])
# }
Run the code above in your browser using DataLab