A qkernel spectral clustering algorithm. Clustering is performed by embedding the data into the subspace of the eigenvectors of a graph Laplacian matrix.
# S4 method for matrix
qkspecc(x,kernel = "rbfbase", qpar = list(sigma = 2, q = 0.9),
Nocent=NA, normalize="symmetric", maxk=20, iterations=200,
na.action = na.omit, ...)# S4 method for cndkernmatrix
qkspecc(x, Nocent=NA, normalize="symmetric",
maxk=20,iterations=200, ...)
# S4 method for qkernmatrix
qkspecc(x, Nocent=NA, normalize="symmetric",
maxk=20,iterations=200, ...)
the matrix of data to be clustered or a kernel Matrix of class
qkernmatrix
or cndkernmatrix
.
the kernel function used in computing the affinity matrix. This parameter can be set to any function, of class kernel, which computes a kernel function value between two vector arguments. kernlab provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:
rbfbase
Radial Basis qkernel function "Gaussian"
nonlbase
Non Linear qkernel function
laplbase
Laplbase qkernel function
ratibase
Rational Quadratic qkernel function
multbase
Multiquadric qkernel function
invbase
Inverse Multiquadric qkernel function
wavbase
Wave qkernel function
powbase
d qkernel function
logbase
Log qkernel function
caubase
Cauchy qkernel function
chibase
Chi-Square qkernel function
studbase
Generalized T-Student qkernel function
nonlcnd
Non Linear cndkernel function
polycnd
Polynomial cndkernel function
rbfcnd
Radial Basis cndkernel function "Gaussian"
laplcnd
Laplacian cndkernel function
anocnd
ANOVA cndkernel function
raticnd
Rational Quadratic cndkernel function
multcnd
Multiquadric cndkernel function
invcnd
Inverse Multiquadric cndkernel function
wavcnd
Wave cndkernel function
powcnd
d cndkernel function
logcnd
Log cndkernel function
caucnd
Cauchy cndkernel function
chicnd
Chi-Square cndkernel function
studcnd
Generalized T-Student cndkernel function
The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.
a character string or the list of hyper-parameters (kernel parameters).
The default character string list(sigma = 2, q = 0.9)
uses a heuristic to determine a
suitable value for the width parameter of the RBF kernel.
The second option "local"
(local scaling) uses a more advanced heuristic
and sets a width parameter for every point in the data set. This is
particularly useful when the data incorporates multiple scales.
A list can also be used containing the parameters to be used with the
kernel function. Valid parameters for existing kernels are :
sigma, q
for the Radial Basis qkernel function "rbfbase" , the Laplacian qkernel function "laplbase" and the Cauchy qkernel function "caubase".
alpha, q
for the Non Linear qkernel function "nonlbase".
c, q
for the Rational Quadratic qkernel function "ratibase" , the Multiquadric qkernel function "multbase" and the Inverse Multiquadric qkernel function "invbase".
theta, q
for the Wave qkernel function "wavbase".
d, q
for the d qkernel function "powbase" , the Log qkernel function "logbase" and the Generalized T-Student qkernel function "studbase".
alpha
for the Non Linear cndkernel function "nonlcnd".
d, alpha, c
for the Polynomial cndkernel function "polycnd".
gamma
for the Radial Basis cndkernel function "rbfcnd" and the Laplacian cndkernel function "laplcnd" and the Cauchy cndkernel function "caucnd".
d, sigma
for the ANOVA cndkernel function "anocnd".
c
for the Rational Quadratic cndkernel function "raticnd" , the Multiquadric cndkernel function "multcnd" and the Inverse Multiquadric cndkernel function "invcnd".
theta
for the Wave cndkernel function "wavcnd".
d
for the d cndkernel function "powcnd" , the Log cndkernel function "logcnd" and the Generalized T-Student cndkernel function "studcnd".
where length is the length of the strings considered, lambda the
decay factor and normalized a logical parameter determining if the
kernel evaluations should be normalized.
Hyper-parameters for user defined kernels can be passed through the qpar parameter as well.
the number of clusters.
Normalisation of the Laplacian ("none", "symmetric" or "random-walk").
If k is NA, an upper bound for the automatic estimation. Defaults to 20.
the maximum number of iterations allowed.
the action to perform on NA.
additional parameters.
An S4 object of class qkspecc
which extends the class vector
containing integers indicating the cluster to which
each point is allocated. The following slots contain useful information
The cluster assignments
The corresponding eigenvector
The corresponding eigenvalues
The eigenvectors corresponding to the \(k\) smallest eigenvalues of the graph Laplacian matrix.
The qkernel spectral clustering works by embedding the data points of the
partitioning problem into the subspace of the eigenvectors corresponding to the \(k\)
smallest eigenvalues of the graph Laplacian matrix. Using a simple clustering method like
kmeans
on the embedded points usually leads to good performance. It can be shown that
qkernel spectral clustering methods boil down to graph partitioning.
The data can be passed to the qkspecc
function in a matrix
,
in addition qkspecc
also supports input in the form of a
kernel matrix of class qkernmatrix
or cndkernmatrix
.
Andrew Y. Ng, Michael I. Jordan, Yair Weiss On Spectral Clustering: Analysis and an Algorithm Neural Information Processing Symposium 2001
# NOT RUN {
data("iris")
x=as.matrix(iris[,-5])
qspe <- qkspecc(x,kernel = "rbfbase", qpar = list(sigma = 10, q = 0.9),
Nocent=3, normalize="symmetric", maxk=15, iterations=1200)
plot(x, col = clust(qspe))
qkfunc <- nonlbase(alpha=1/15,q=0.8)
Ktrain <- qkernmatrix(qkfunc, x)
qspe <- qkspecc(Ktrain, Nocent=3, normalize="symmetric", maxk=20)
plot(x, col = clust(qspe))
# }
Run the code above in your browser using DataLab