flexclust (version 1.3-4)

kcca: K-Centroids Cluster Analysis

Description

Perform k-centroids clustering on a data matrix.

Usage

kcca(x, k, family=kccaFamily("kmeans"), weights=NULL, group=NULL, control=NULL, simple=FALSE, save.data=FALSE) kccaFamily(which=NULL, dist=NULL, cent=NULL, name=which, preproc = NULL, trim=0, groupFun = "minSumClusters")
"summary"(object)

Arguments

x
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
k
Either the number of clusters, or a vector of cluster assignments, or a matrix of initial (distinct) cluster centroids. If a number, a random set of (distinct) rows in x is chosen as the initial centroids.
family
Object of class kccaFamily.
weights
An optional vector of weights to be used in the clustering process, cannot be combined with all families.
group
An optional grouping vector for the data, see details below.
control
An object of class flexclustControl.
simple
Return an object of class kccasimple?
save.data
Save a copy of x in the return object?
which
One of "kmeans", "kmedians", "angle", "jaccard", or "ejaccard".
name
Optional long name for family, used only for show methods.
dist
A function for distance computation, ignored if which is specified.
cent
A function for centroid computation, ignored if which is specified.
preproc
Function for data preprocessing.
trim
A number in between 0 and 0.5, if non-zero then trimmed means are used for the kmeans family, ignored by all other families.
groupFun
Function or name of function to obtain clusters for grouped data, see details below.
object
Object of class "kcca".

Value

Function kcca returns objects of class "kcca" or "kccasimple" depending on the value of argument simple. The simpler objects contain fewer slots and hence are faster to compute, but contain no auxiliary information used by the plotting methods. Most plot methods for "kccasimple" objects do nothing and return a warning. If only centroids, cluster membership or prediction for new data are of interest, then the simple objects are sufficient.

Predefined Families

Function kccaFamily() currently has the following predefined families (distance / centroid):
kmeans:
Euclidean distance / mean
kmedians:
Manhattan distance / median
angle:
angle between observation and centroid / standardized mean
jaccard:
Jaccard distance / numeric optimization
ejaccard:
Jaccard distance / mean
See Leisch (2006) for details on all combinations.

Group Constraints

If group is not NULL, then observations from the same group are restricted to belong to the same cluster (must-link constraint) or different clusters (cannot-link constraint) during the fitting process. If groupFun = "minSumClusters", then all group members are assign to the cluster where the center has minimal average distance to the group members. If groupFun = "majorityClusters", then all group members are assigned to the cluster the majority would belong to without a constraint. groupFun = "differentClusters" implements a cannot-link constraint, i.e., members of one group are not allowed to belong to the same cluster. The optimal allocation for each group is found by solving a linear sum assignment problem using solve_LSAP. Obviously the group sizes must be smaller than the number of clusters in this case. Ties are broken at random in all cases. Note that at the moment not all methods for fitted "kcca" objects respect the grouping information, most importantly the plot method when a data argument is specified.

Details

See the paper A Toolbox for K-Centroids Cluster Analysis referenced below for details.

References

Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis. Computational Statistics and Data Analysis, 51 (2), 526--544, 2006.

Friedrich Leisch and Bettina Gruen. Extending standard cluster algorithms to allow for group constraints. In Alfredo Rizzi and Maurizio Vichi, editors, Compstat 2006-Proceedings in Computational Statistics, pages 885-892. Physica Verlag, Heidelberg, Germany, 2006.

See Also

stepFlexclust, cclust, distances

Examples

Run this code
data("Nclus")
plot(Nclus)

## try kmeans 
cl1 = kcca(Nclus, k=4)
cl1

image(cl1)
points(Nclus)

## A barplot of the centroids 
barplot(cl1)


## now use k-medians and kmeans++ initialization, cluster centroids
## should be similar...

cl2 = kcca(Nclus, k=4, family=kccaFamily("kmedians"),
           control=list(initcent="kmeanspp"))
cl2

## ... but the boundaries of the partitions have a different shape
image(cl2)
points(Nclus)

Run the code above in your browser using DataLab