cclust: Convex Clustering

Description

Perform k-means clustering, hard competitive learning or neural gas on a data matrix.

Usage

cclust(x, k, dist = "euclidean", method = "kmeans",
       weights=NULL, control=NULL, group=NULL, simple=FALSE,
       save.data=FALSE)

Arguments

A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

Either the number of clusters, or a vector of cluster assignments, or a matrix of initial (distinct) cluster centroids. If a number, a random set of (distinct) rows in x is chosen as the initial centroids.

dist

Distance measure, one of "euclidean" (mean square distance) or "manhattan " (absolute distance).

method

Clustering algorithm: one of "kmeans", "hardcl" or "neuralgas", see details below.

weights

An optional vector of weights to be used in the fitting process. Works only in combination with hard competitive learning.

control

An object of class "cclustControl".

group

Currently ignored.

simple

Return an object of class "kccasimple"?

save.data

Save a copy of x in the return object?

Value

An object of class "kcca".

Details

This function uses the same computational engine as the earlier function of the same name from package `cclust'. The main difference is that it returns an S4 object of class "kcca", hence all available methods for "kcca" objects can be used. By default kcca and cclust use exactly the same algorithm, but cclust will usually be much faster because it uses compiled code.

If dist is "euclidean", the distance between the cluster center and the data points is the Euclidian distance (ordinary kmeans algorithm), and cluster means are used as centroids. If "manhattan", the distance between the cluster center and the data points is the sum of the absolute values of the distances, and the column-wise cluster medians are used as centroids.

If method is "kmeans", the classic kmeans algorithm as given by MacQueen (1967) is used, which works by repeatedly moving all cluster centers to the mean of their respective Voronoi sets. If "hardcl", on-line updates are used (AKA hard competitive learning), which work by randomly drawing an observation from x and moving the closest center towards that point (e.g., Ripley 1996). If "neuralgas" then the neural gas algorithm by Martinetz et al (1993) is used. It is similar to hard competitive learning, but in addition to the closest centroid also the second closest centroid is moved in each iteration.

References

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam \& J. Neyman, 1, pp. 281--297. Berkeley, CA: University of California Press.

Martinetz T., Berkovich S., and Schulten K (1993). `Neural-Gas' Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4 (4), pp. 558--569.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Examples

Run this code

# NOT RUN {
## a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd=0.3), ncol=2),
           matrix(rnorm(100, mean=1, sd=0.3), ncol=2))
cl <- cclust(x,2)
plot(x, col=predict(cl))
points(cl@centers, pch="x", cex=2, col=3) 

## a 3-dimensional example 
x <- rbind(matrix(rnorm(150, sd=0.3), ncol=3),
           matrix(rnorm(150, mean=2, sd=0.3), ncol=3),
           matrix(rnorm(150, mean=4, sd=0.3), ncol=3))
cl <- cclust(x, 6, method="neuralgas", save.data=TRUE)
pairs(x, col=predict(cl))
plot(cl)
# }

Run the code above in your browser using DataLab