cluster.magazine: Run many clustering methods on many numbers of clusters

Description

Runs a user-specified set of clustering methods (CBI-functions, see kmeansCBI with several numbers of clusters on a dataset with unified output.

Usage

cluster.magazine(data,G,diss = inherits(data, "dist"),
                             scaling=TRUE, clustermethod,
                             distmethod=rep(TRUE,length(clustermethod)),
                             ncinput=rep(TRUE,length(clustermethod)),
                             clustermethodpars,
                             trace=TRUE)

Value

List of lists comprising

output: Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. This stores the full output of clustermethod i run on number of clusters j.
clustering: Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. This stores the clustering integer vector (i.e., the partition-component of the CBI-function, see kmeansCBI) of clustermethod i run on number of clusters j.
noise: Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. List entries are single logicals. If TRUE, the clustering method estimated some noise, i.e., points not belonging to any cluster, which in the clustering vector are indicated by the highest number (number of clusters plus one in case that the number of clusters was fixed).
othernc: list of integer vectors of length 2. The first number is the number of the clustering method (the order is determined by argument clustermethod), the second number is the number of clusters for those methods that estimate the number of clusters themselves and estimate a number that is smaller than min(G) or larger than max(G).

Arguments

data: data matrix or dist-object.
G: vector of integers. Numbers of clusters to consider.
diss: logical. If TRUE, the data matrix is assumed to be a distance/dissimilariy matrix, otherwise it's observations times variables.
scaling: either a logical or a numeric vector of length equal to the number of columns of data. If FALSE, data won't be scaled, otherwise scaling is passed on to scale as argumentscale.
clustermethod: vector of strings specifying names of CBI-functions (see kmeansCBI). These are the clustering methods to be applied.
distmethod: vector of logicals, of the same length as clustermethod. TRUE means that the clustering method operates on distances. If diss=TRUE, all entries have to be TRUE. Otherwise, if an entry is true, the corresponding method will be applied on dist(data).
ncinput: vector of logicals, of the same length as clustermethod. TRUE indicates that the corresponding clustering method requires the number of clusters as input and will not estimate the number of clusters itself.
clustermethodpars: list of the same length as clustermethod. Specifies parameters for all involved clustering methods. Its jth entry is passed to clustermethod number k. Can be an empty entry in case all defaults are used for a clustering method. The number of clusters does not need to be specified here.
trace: logical. If TRUE, some runtime information is printed.

Author

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282

Examples

Run this code

  
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  clustermethod=c("kmeansCBI","hclustCBI","hclustCBI")
# A clustering method can be used more than once, with different
# parameters
  clustermethodpars <- list()
  clustermethodpars[[2]] <- clustermethodpars[[3]] <- list()
  clustermethodpars[[2]]$method <- "complete"
  clustermethodpars[[3]]$method <- "average"
  cmf <-  cluster.magazine(face,G=2:3,clustermethod=clustermethod,
    distmethod=rep(FALSE,3),clustermethodpars=clustermethodpars)
  print(str(cmf))

Run the code above in your browser using DataLab