Zsimilarity: Summarises MCMC clustering labels with a similarity matrix and finds the 'average' clustering

Description

This functions takes a Monte Carlo sample of cluster labels, converts them to adjacency matrices, and computes a similarity matrix as an average of the adjacency matrices. The dimension of the similarity matrix is invariant to label switching and the number of clusters in each sample. As a summary of the posterior clustering, the index of the clustering with minimum squared distance to this 'average' clustering is reported.

Usage

Zsimilarity(zs)

Arguments

A matrix containing samples of clustering labels where the rows correspond to the number of observations and the columns correspond to the number of iterations.

Value

A list containing three elements:

z.avg: The 'average' clustering, with minimum squared distance to z.sim.
z.sim: The N x N similary matrix, in a sparse format (see as.simple_triplet_matrix). If the data have been previously ordered, a (ordered) heatmap may provide a useful visualisation. The user is also invited to perform hierarchical clustering using hclust after first converting this similarity matrix to a distance matrix - "complete" linkage is recommended.
dist.z: A vector of length N recording the distances between each clustering and the 'average' clustering.

Examples

Run this code

# NOT RUN {
# Run a IMIFA model and extract the sampled cluster labels
# data(olive)
# sim    <- mcmc_IMIFA(olive, method="IMIFA", n.iters=5000)
# zs     <- sim[[1]][[1]]$z.store

# Get the similarity matrix and visualise it
# zsimil <- Zsimilarity(zs)
# z.sim  <- as.matrix(zsimil$z.sim)
# z.sim2 <- replace(z.sim, z.sim == 0, NA)
# image(z.sim2, col=heat.colors(30)[30:1]); box(lwd=2)

# Extract the clustering with minimum squared distance to this
# 'average' and evaluate its performance against the true labels
# z.avg  <- zsimil$z.avg
# table(z.avg, olive$area)

# Perform hierarchical clustering on the distance matrix
# Hcl    <- hclust(as.dist(1 - z.sim), method="complete")
# plot(Hcl)
# hier.z <- cutree(Hcl, k=3)
# table(hier.z, olive$area)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples