spectral.clust: Spectral Clustering using Similarity or Distance Matrices

Description

This function calculates performs spectral clustering with the k-means step using precomputed similarity or distance matrices, and returns a vector of cluster assignments.

Usage

spectral.clust(S, k, nstart = 10, iter.max = 1000, 
              is.sim = NULL, neighbours = 10)

Value

spectral.clust returns a list object with the following components:

clusters: an \(n\)-variate integer vector indicating the cluster assignment for each observation, as determined by the kmeans algorithm.
S: the original \(n \times n\) numeric matrix used as input, representing either pairwise similarities or distances between observations, depending on the is.sim argument.

Arguments

S: a \(n \times n\) numeric matrix representing either pairwise similarities or distances between observations. The matrix can be a similarity matrix or a distance matrix, as indicated by the is.sim argument.
k: integer value specifying the number of clusters to form. This is passed to the kmeans algorithm.
nstart: integer value specifying the number of random starts for the bandwidth estimation. Defaults to 3 or the number of variables, whichever is larger.
iter.max: integer value specifying the maximum number of iterations for the kmeans algorithm. Defaults to 1000.
is.sim: logical value indicating whether the input matrix S is a similarity matrix. If set to TRUE, S is treated as a similarity matrix. If set to FALSE, S is treated as a distance matrix. Must be specified.
neighbours: integer value specifying the number of nearest neighbours to consider when constructing the graph Laplacian. This helps in determining the structure of the graph from the similarity or distance matrix. Defaults to 10.

Author

John R. J. Thompson john.thompson@ubc.ca, Jesse S. Ghashti jesse.ghashti@ubc.ca

Details

spectral.clust implements spectral clustering on pairwise similarity or distance matrices, following the method described by Ng et al. (2001). The function first constructs an adjacency matrix from the input similarity or distance matrix S using the neighbours parameter to define the nearest connections. If S is a similarity matrix (is.sim = TRUE), the function retains the largest values corresponding to the neighbours nearest observations. If S is a distance matrix (is.sim = FALSE), it retains the smallest values for the nearest observations. The adjacency matrix is symmetrized and used to compute the unnormalized Laplacian matrix. The eigenvectors corresponding to the smallest eigenvalues of the Laplacian are extracted and clustered using the kmeans algorithm. The number of clusters, k, and parameters such as the number of random starts (nstart) and maximum iterations (iter.max) for the kmeans step are user-specified.

References

Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. “Advances in Neural Information processing systems”, 14.

Examples

Run this code

# load the Iris dataset
dat <- iris[,-5]

# calculate pairwise similarities using maximum likelihood cross validation
S <- kss(dat, bw = "np", npmethod = "cv.ml", cFUN = "c_gaussian", verbose = TRUE)

# cluster points using spectral clustering and compare to true class labels
cl <- spectral.clust(S$similarities, 3, is.sim = TRUE)
table(cl$clusters, iris[,5])

# try a different number of neighbours
cl2 <- spectral.clust(S$similarities, 3, is.sim = TRUE, neighbours = 4)
table(cl2$clusters, iris[,5])

Run the code above in your browser using DataLab