Validity.indices: Function to compute the validity index of each cluster.

Description

It computes the validity index (e.g. the stability index) for each individual cluster. This function is called by Cluster.validity and Cluster.validity.from.similarity

Usage

Validity.indices(cluster, c, Sim.M)

Value

vector of the validity indices. Each element corresponds to validity index of each cluster.

Arguments

cluster: list of clusters representing a clustering in the original space. Each element of the list is a vector whose elements are the examples belonging to the cluster.
c: number of clusters
Sim.M: the pairwise similarity matrix

Author

Giorgio Valentini valentini@di.unimi.it

Details

Using the similarity matrix M, the stability index s for a cluster A is: $$ s(A) = \frac{1}{|A|(|A|-1)} \sum_{(i,j) \in A \times A, i\neq j} M_{ij} $$ The index $s(A)$ estimates the stability of a cluster $A$ by measuring how much the projections of the pairs $(i,j) \in A \times A$ occur together in the same cluster in the projected subspaces. The stability index has values between 0 and 1: low values indicate no reliable clusters, high values denote stable clusters.

Examples

Run this code

# Computation of the stability indices found out by a hierarchical clustering algorithm 
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
d <- dist (t(M)); 
tree <- hclust(d, method = "average");
plot(tree, main="");
cl.orig <- rect.hclust(tree, k = 3);
l.norm <- Multiple.Random.hclustering (M, dim=100, pmethod="Norm", 
                                       c=3, hmethod="average", n=20)
Sim <- Do.similarity.matrix.partition(l.norm);
val.indices <- Validity.indices(cl.orig, c=3, Sim)

Run the code above in your browser using DataLab