kernlab (version 0.9-7)

specc: Spectral Clustering

Description

A spectral clustering algorithm. Clustering is performed by embedding the data into the subspace of the eigenvectors of an affinity matrix.

Usage

## S3 method for class 'formula':
specc(x, data = NULL, na.action = na.omit, ...)

## S3 method for class 'matrix': specc(x, centers, kernel = "rbfdot", kpar = "automatic", nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6, iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

## S3 method for class 'kernelMatrix': specc(x, centers, nystrom.red = FALSE, iterations = 200, ...)

## S3 method for class 'list': specc(x, centers, kernel = "stringdot", kpar = list(length=4, lambda=0.5), nystrom.red = FALSE, nystrom.sample = length(x)/6, iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

Arguments

x
the matrix of data to be clustered, or a symbolic description of the model to be fit, or a kernel Matrix of class kernelMatrix, or a list of character vectors.
data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which `specc' is called from.
centers
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in the eigenvectors matrix are chosen as the initial centers.
kernel
the kernel function used in computing the affinity matrix. This parameter can be set to any function, of class kernel, which computes a dot product between two vector arguments. kernlab provides the most popular kernel functions which can be
kpar
a character string or the list of hyper-parameters (kernel parameters). The default character string "automatic" uses a heuristic to determine a suitable value for the width parameter of the RBF kernel. The second option "lo
nystrom.red
use nystrom method to calculate eigenvectors. When TRUE a sample of the dataset is used to calculate the eigenvalues, thus only a $n x m$ matrix where $n$ the sample size is stored in memory (default: FALSE
nystrom.sample
number of data points to use for estimating the eigenvalues when using the nystrom method. (default : dim(x)[1]/6)
mod.sample
proportion of data to use when estimating sigma (default: 0.75)
iterations
the maximum number of iterations allowed.
na.action
the action to perform on NA
...
additional parameters

Value

  • An S4 object of class specc wich extends the class vector containing integers indicating the cluster to which each point is allocated. The following slots contain useful information
  • centersA matrix of cluster centers.
  • sizeThe number of point in each cluster
  • withinssThe within-cluster sum of squares for each cluster
  • kernelfThe kernel function used

Details

Spectral clustering works by embedding the data points of the partitioning problem into the subspace of the $k$ largest eigenvectors of a normalized affinity/kernel matrix. Using a simple clustering method like kmeans on the embedded points usually leads to good performance. It can be shown that spectral clustering methods boil down to graph partitioning. The data can be passed to the specc function in a matrix or a data.frame, in addition specc also supports input in the form of a kernel matrix of class kernelMatrix or as a list of character vectors where a string kernel has to be used.

References

Andrew Y. Ng, Michael I. Jordan, Yair Weiss On Spectral Clustering: Analysis and an Algorithm Neural Information Processing Symposium 2001 http://www.nips.cc/NIPS2001/papers/psgz/AA35.ps.gz

See Also

kkmeans, kpca, kcca

Examples

Run this code
## Cluster the spirals data set.
data(spirals)

sc <- specc(spirals, centers=2)

sc
centers(sc)
size(sc)
withinss(sc)

plot(spirals, col=sc)

Run the code above in your browser using DataCamp Workspace