Do.similarity.matrix: Functions to compute a pairwise similarity matrix.

Description

The elements of a similarity matrix represent the frequency by which each pair of examples belongs to the same cluster across multiple clusterings. These functions may also be used with clusterings with a variable number of clusters.

Usage

Do.similarity.matrix(l, dim.Sim.M)
Do.similarity.matrix.partition(l)

Value

A pairwise similarity matrix whose elements represents how much 2 examples fall in the same cluster across multiple clusterings. Each element of the matrix is normalized so that its value is beween 0 and 1.

Arguments

l: list of clusterings. Each element is a list of clusters. Each cluster is a vector whose elements (integers) represent the examples
dim.Sim.M: dimension of the similarity matrix (number of examples)

Author

Giorgio Valentini valentini@di.unimi.it

Details

A $n \times n$ similarity matrix M to a k-clustering; the elements $M_{ij}$ of M are defined as: $$ M_{ij} = \sum_{s=1}^k \chi_{A_s}[i] \cdot \chi_{A_s}[j] $$ where $i,j \in \{1,2,\ldots,n\}$, and $\chi_{A_s} \in \{0,1\}^n$ is the characteristic vector of $A_s \subseteq \{1,2,\ldots,n\}$, i.e. $\chi_{A_s}[i] = 1$ if $i \in A_s$, otherwise $\chi_{A_s}[i] = 0$. If the k-clustering identifies a partition, $M_{ij} \in \{0,1\}$: in other words, $M_{ij}$ denotes if elements i and j belong to the same cluster. Consider also a random projection $\mu : \mathcal{R}^d \rightarrow \mathcal{R}^{d'}$. Then a similarity matrix M can be computed averaging among multiple clusterings obtained from multiple random projections. This similarity matrix represents how much pairs of projected examples belong to the same cluster averaging across the repeated random projections. Do.similarity.matrix can be used with clusterings that do not strictly define a partition (that is a specific example may belong to more than 1 cluster). Do.similarity.matrix.partition may be used only with clusterings that strictly define a partition.

Examples

Run this code

# Computing the similarity matrix associated to 20 hierarchical clusterings 
# using Normal projections. 
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
l.norm <- Multiple.Random.hclustering (M, dim=100, pmethod="Norm", c=3, 
                                       hmethod="average", n=20)
Sim <- Do.similarity.matrix.partition(l.norm);
# The same as above, but with 30 hierarchical clusterings using PMO projections. 
l.PMO <- Multiple.Random.hclustering (M, dim=100, pmethod="PMO", c=3, 
                                      hmethod="average", n=30)
Sim.PMO <- Do.similarity.matrix.partition(l.norm);

Run the code above in your browser using DataLab