For a given dataset this simulates random clusterings using
stupidkcentroids
, stupidknn
,
stupidkfn
, and stupidkaven
. It then
computes and stores a set of cluster validity indexes for every
clustering.
randomclustersim(datadist,datanp=NULL,npstats=FALSE,
G,nnruns=100,kmruns=100,fnruns=100,avenruns=100,
nnk=4,dnnk=2,
pamcrit=TRUE,
multicore=FALSE,cores=detectCores()-1,monitor=TRUE)
distances on which validation-measures are based, dist
object or distance matrix.
optional observations times variables data matrix, see
npstats
.
logical. If TRUE
, distrsimilarity
is called and the two statistics computed there are added to the
output. These are based on datanp
and require datanp
to be specified.
vector of integers. Numbers of clusters to consider.
integer. Number of runs of stupidknn
.
integer. Number of runs of stupidkcentroids
.
integer. Number of runs of stupidkfn
.
integer. Number of runs of stupidkaven
.
nnk
-argument to be passed on to
cqcluster.stats
.
nnk
-argument to be passed on to
distrsimilarity
.
pamcrit
-argument to be passed on to
cqcluster.stats
.
logical. If TRUE
, parallel computing is used
through the function mclapply
from package
parallel
; read warnings there if you intend to use this; it
won't work on Windows.
integer. Number of cores for parallelisation.
logical. If TRUE
, it will print some runtime
information.
List with components
list, indexed by number of clusters. Every entry is
a data frame with nnruns
observations for every simulation
run of stupidknn
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with fnruns
observations for every simulation
run of stupidkfn
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with avenruns
observations for every simulation
run of stupidkaven
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with kmruns
observations for every simulation
run of stupidkcentroids
. The variables of the data
frame are avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
number of involved runs of stupidknn
,
number of involved runs of stupidkfn
,
number of involved runs of stupidkaven
,
number of involved runs of stupidkcentroids
,
Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. On arxiv from February 2020.
stupidkcentroids
, stupidknn
, stupidkfn
, stupidkaven
, clustatsum
# NOT RUN {
set.seed(20000)
options(digits=3)
face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
randomclustersim(dist(face),datanp=face,npstats=TRUE,G=2:3,
nnruns=2,kmruns=2, fnruns=1,avenruns=1,nnk=2)
# }
Run the code above in your browser using DataLab