findrep: Finding representatives for cluster border

Description

Finds representative objects for the border of a cluster and the within-cluster variance as defined in the framework of the cdbw cluster validation index (and meant to be used in that context).

Usage

findrep(x,xcen,clustering,cluster,r,p=ncol(x),n=nrow(x),
                    nc=sum(clustering==cluster))

Arguments

matrix. Euclidean dataset.

xcen

mean vector of cluster.

clustering

vector of integers with length =nrow(x); indicating the cluster for each observation.

cluster

integer. Number of cluster to be treated.

integer. Number of representatives.

integer. Number of dimensions.

integer. Number of observations.

integer. Number of observations in cluster.

Value

List with components

repc

vector of index of representatives (out of all observations).

repx

vector of index of representatives (out of only the observations in cluster).

maxr

number of representatives (this can be smaller than r if fewer pairwise different observations are in cluster.

wvar

estimated average within-cluster squared distance to mean.

References

Halkidi, M. and Vazirgiannis, M. (2008) A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters 29, 773-786.

Halkidi, M., Vazirgiannis, M. and Hennig, C. (2015) Method-independent indices for cluster validation. In C. Hennig, M. Meila, F. Murtagh, R. Rocci (eds.) Handbook of Cluster Analysis, CRC Press/Taylor & Francis, Boca Raton.

Examples

Run this code

# NOT RUN {
  options(digits=3)
  iriss <- as.matrix(iris[c(1:5,51:55,101:105),-5])
  irisc <- as.numeric(iris[c(1:5,51:55,101:105),5])
  findrep(iriss,colMeans(iriss),irisc,cluster=1,r=2)
# }