Visualisation and print function for cluster validation output compared to results on simulated random clusterings. The print method can also be used to compute and print an aggregated cluster validation index.

```
# S3 method for valstat
plot(x,simobject=NULL,statistic="sindex",
xlim=NULL,ylim=c(0,1),
nmethods=length(x)-5,
col=1:nmethods,cex=1,pch=c("c","f","a","n"),
simcol=rep(grey(0.7),4),
shift=c(-0.1,-1/3,1/3,0.1),include.othernc=NULL,...)
```
# S3 method for valstat
print(x,statistics=x$statistics,
nmethods=length(x)-5,aggregate=FALSE,
weights=NULL,digits=2,
include.othernc=NULL,...)

x

object of class `"valstat"`

, such as sublists
`stat, qstat, sstat`

of `clusterbenchstats`

-output.

simobject

list of simulation results as produced by
`randomclustersim`

and documented there; typically sublist
`sim`

of `clusterbenchstats`

-output.

statistic

one of ```
"avewithin","mnnd","variation",
"diameter","gap","sindex","minsep","asw","dindex","denscut",
"highdgap","pg","withinss","entropy","pamc","kdnorm","kdunif","dmode"
```

;
validation statistic to be plotted.

xlim

passed on to `plot`

. Default is the range of all
involved numbers of clusters, minimum minus 0.5 to maximum plus
0.5.

ylim

passed on to `plot`

.

nmethods

integer. Number of clustering methods to involve
(these are those from number 1 to `nmethods`

specified in
`x$name`

).

col

colours used for the different clustering methods.

cex

passed on to `plot`

.

pch

vector of symbols for random clustering results from
`stupidkcentroids`

, `stupidkfn`

,
`stupidkaven`

, `stupidknn`

.
To be passed on to `plot`

.

simcol

vector of colours used for random clustering results in
order `stupidkcentroids`

, `stupidkfn`

,
`stupidkaven`

, `stupidknn`

.

shift

numeric vector. Indicates the amount to which the results
from `stupidkcentroids`

, `stupidkfn`

,
`stupidkaven`

, `stupidknn`

are
plotted to the right of their respective
number of clusters (negative numbers plot to the left).

include.othernc

this indicates whether methods should be
included that estimated their number of clusters themselves and gave
a result outside the standard range as given by `x$minG`

and `x$maxG`

. If not `NULL`

, this is a
list of integer vectors of length 2. The first
number is
the number of the clustering method (the order is determined by
argument `x$name`

), the second number is the
number of clusters for those methods that estimate the number of
clusters themselves and estimated a number outside the standard
range. Normally what will be used here, if not `NULL`

, is the
output parameter
`cm$othernc`

of `clusterbenchstats`

, see also
`cluster.magazine`

.

statistics

vector of character strings specifying the validation statistics that will be included in the output (unless you want to restrict the output for some reason, the default should be fine.

aggregate

logical. If `TRUE`

, an aggegate validation
statistic will be computed as the weighted mean of the involved
statistic. This requires `weights`

to be set. In order for this
to make sense, values of the validation statistics should be
comparable, which is achieved by standardisation in
`clusterbenchstats`

. Accordingly, `x`

should
be the `qstat`

or `sstat`

-component of the
`clusterbenchstats`

-output rather than the
`stat`

-component.

weights

vector of numericals. Weights for computation of the
aggregate statistic in case that `aggregate=TRUE`

. The order of
clustering methods corresponding to the weight vector is given by
`x$name`

.

digits

minimal number of significant digits, passed on to
`print.table`

.

...

no effect.

`print.valstats`

returns the results table as invisible object.

Whereas `print.valstat`

, at least with `aggregate=TRUE`

makes more sense for the `qstat`

or `sstat`

-component of the
`clusterbenchstats`

-output rather than the
`stat`

-component, `plot.valstat`

should be run with the
`stat`

-component if `simobject`

is specified, because the
simulated cluster validity statistics are unstandardised and need to
be compared with unstandardised values on the dataset of interest.

`print.valstat`

will print all values for all validation indexes
and the aggregated index (in case of `aggregate=TRUE`

and set
`weights`

will be printed last.

Hennig, C. (2019) Cluster validation by measurement of clustering
characteristics relevant to the user. In C. H. Skiadas (ed.)
*Data Analysis and Applications 1: Clustering and Regression,
Modeling-estimating, Forecasting and Data Mining, Volume 2*, Wiley,
New York 1-24,
https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster
validity indexes for context-adapted comparison of clusterings.
*Statistics and Computing*, 30, 1523-1544,
https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822

# NOT RUN { set.seed(20000) options(digits=3) face <- rFace(10,dMoNo=2,dNoEy=0,p=2) clustermethod=c("kmeansCBI","hclustCBI","hclustCBI") clustermethodpars <- list() clustermethodpars[[2]] <- clustermethodpars[[3]] <- list() clustermethodpars[[2]]$method <- "ward.D2" clustermethodpars[[3]]$method <- "single" methodname <- c("kmeans","ward","single") cbs <- clusterbenchstats(face,G=2:3,clustermethod=clustermethod, methodname=methodname,distmethod=rep(FALSE,3), clustermethodpars=clustermethodpars,nnruns=2,kmruns=2,fnruns=2,avenruns=2) plot(cbs$stat,cbs$sim) plot(cbs$stat,cbs$sim,statistic="dindex") plot(cbs$stat,cbs$sim,statistic="avewithin") print(cbs$sstat,aggregate=TRUE,weights=c(1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0)) # Some of the values are "NaN" because due to the low number of runs of # the stupid clustering methods there is no variation. If this happens # in a real application, nnruns etc. should be chosen higher than 2. # Also useallg=TRUE in clusterbenchstats may help. # }