kernlab (version 0.9-24)

kkmeans: Kernel k-means

Description

A weighted kernel version of the famous k-means algorithm.

Usage

"kkmeans"(x, data = NULL, na.action = na.omit, ...)
"kkmeans"(x, centers, kernel = "rbfdot", kpar = "automatic", alg="kkmeans", p=1, na.action = na.omit, ...)
"kkmeans"(x, centers, ...)
"kkmeans"(x, centers, kernel = "stringdot", kpar = list(length=4, lambda=0.5), alg ="kkmeans", p = 1, na.action = na.omit, ...)

Arguments

x
the matrix of data to be clustered, or a symbolic description of the model to be fit, or a kernel Matrix of class kernelMatrix, or a list of character vectors.
data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which `kkmeans' is called from.
centers
Either the number of clusters or a matrix of initial cluster centers. If the first a random initial partitioning is used.
kernel
the kernel function used in training and predicting. This parameter can be set to any function, of class kernel, which computes a inner product in feature space between two vector arguments (see link{kernels}). kernlab provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:

  • rbfdot Radial Basis kernel "Gaussian"

  • polydot Polynomial kernel
  • vanilladot Linear kernel
  • tanhdot Hyperbolic tangent kernel
  • laplacedot Laplacian kernel
  • besseldot Bessel kernel
  • anovadot ANOVA RBF kernel
  • splinedot Spline kernel
  • stringdot String kernel
  • Setting the kernel parameter to "matrix" treats x as a kernel matrix calling the kernelMatrix interface. The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.

    kpar
    a character string or the list of hyper-parameters (kernel parameters). The default character string "automatic" uses a heuristic the determine a suitable value for the width parameter of the RBF kernel.

    A list can also be used containing the parameters to be used with the kernel function. Valid parameters for existing kernels are :

    • sigma inverse kernel width for the Radial Basis kernel function "rbfdot" and the Laplacian kernel "laplacedot".

  • degree, scale, offset for the Polynomial kernel "polydot"
  • scale, offset for the Hyperbolic tangent kernel function "tanhdot"
  • sigma, order, degree for the Bessel kernel "besseldot".
  • sigma, degree for the ANOVA kernel "anovadot".
  • length, lambda, normalized for the "stringdot" kernel where length is the length of the strings considered, lambda the decay factor and normalized a logical parameter determining if the kernel evaluations should be normalized.
  • Hyper-parameters for user defined kernels can be passed through the kpar parameter as well.

    alg
    the algorithm to use. Options currently include kkmeans and kerninghan.
    p
    a parameter used to keep the affinity matrix positive semidefinite
    na.action
    The action to perform on NA
    ...
    additional parameters

    Value

    An S4 object of class specc which extends the class vector containing integers indicating the cluster to which each point is allocated. The following slots contain useful information
    centers
    A matrix of cluster centers.
    size
    The number of point in each cluster
    withinss
    The within-cluster sum of squares for each cluster
    kernelf
    The kernel function used

    Details

    kernel k-means uses the 'kernel trick' (i.e. implicitly projecting all data into a non-linear feature space with the use of a kernel) in order to deal with one of the major drawbacks of k-means that is that it cannot capture clusters that are not linearly separable in input space. The algorithm is implemented using the triangle inequality to avoid unnecessary and computational expensive distance calculations. This leads to significant speedup particularly on large data sets with a high number of clusters. With a particular choice of weights this algorithm becomes equivalent to Kernighan-Lin, and the norm-cut graph partitioning algorithms. The function also support input in the form of a kernel matrix or a list of characters for text clustering. The data can be passed to the kkmeans function in a matrix or a data.frame, in addition kkmeans also supports input in the form of a kernel matrix of class kernelMatrix or as a list of character vectors where a string kernel has to be used.

    References

    Inderjit Dhillon, Yuqiang Guan, Brian Kulis A Unified view of Kernel k-means, Spectral Clustering and Graph Partitioning UTCS Technical Report http://web.cse.ohio-state.edu/~kulis/pubs/spectral_techreport.pdf

    See Also

    specc, kpca, kcca

    Examples

    Run this code
    ## Cluster the iris data set.
    data(iris)
    
    sc <- kkmeans(as.matrix(iris[,-5]), centers=3)
    
    sc
    centers(sc)
    size(sc)
    withinss(sc)
    
    
    

    Run the code above in your browser using DataCamp Workspace