Learn R Programming

cba (version 0.1-6)

rockCluster: Rock Clustering

Description

Cluster a data matrix using the Rock algorithm.

Usage

rockCluster(x, n, beta = 1-theta, theta = 0.5, fun = "dists", funArgs = list(method="binary"), debug = FALSE)

Arguments

x
a data matrix
n
the number of desired clusters
beta
optional distance threshold
theta
neighborhood parameter in the range [0,1)
fun
distance function to use
funArgs
a list of named parameter arguments to fun
debug
turn on/off debugging output

Value

  • An object of class rock. A list with the following elements:
  • xthe data matrix or a subset of it.
  • cla factor of cluster labels.
  • sizea vector of cluster sizes.
  • betasee above.
  • thetasee above.

Details

The intended area of application is the clustering of binary (logical) data. For instance in a preprocessing step in data mining. However, arbitrary distance metrics could be used (see dists).

According to the reference (see below) the distance threshold and the neighborhood parameter are coupled. Thus, higher values of the neighborhood parameter theta pose a tighter constraint on the neighborhood. For any two data points the latter is defined as the number of other data points that are neighbors to both. Further, points only are neighbors (or linked) if their distance is less than or equal beta.

Note that for a tight neighborhood specification the algorithm may be running out of clusters to merge, i.e. may terminate with more than the desired number of clusters. The debug option can help in determining the proper settings by examining lines suffixed with a plus which indicates that non-singleton clusters were merged. Note that tie-breaking is not implemented, i.e. the first max encountered is used. However, permuting the order of the data can help in determining the dependence of a solution on ties.

References

S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Science Vol. 25, No. 5, 2000.

See Also

dists for common distance functions, predict for classifying new data samples, and fitted for classifying the clustered data samples.

Examples

Run this code
### example from paper
data(Votes)
x <- as.dummy(Votes[-17])
rc <- rockCluster(x, n=2, theta=0.73, debug=TRUE)
print(rc)
rf <- fitted(rc)
table(Votes$Class, rf$cl)
### large example from paper
data("Mushroom")
x <- as.dummy(Mushroom[-1])
rc <- rockCluster(x[sample(dim(x)[1],1000),], n=10, theta=0.8)
print(rc)
rp <- predict(rc, x)
table(Mushroom$classes, rp$cl)
### real valued example
gdist <- function(x, y=NULL) 1-exp(-dists(x, y)^2)
x <- matrix(rnorm(200, sd=0.6)+rep(rep(c(1,-1),each=50),2), ncol=2)
rc <- rockCluster(x, n=2, theta=0.75, fun=gdist, funArgs=NULL)
print(rc)

Run the code above in your browser using DataLab