R/Weka Clusterers

R interfaces to Weka clustering algorithms.

Cobweb(x, control = NULL)
FarthestFirst(x, control = NULL)
SimpleKMeans(x, control = NULL)
XMeans(x, control = NULL)
DBScan(x, control = NULL)
an R object with the data to be clustered.
an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard

There is a predict method for predicting class ids or memberships from the fitted clusterers. Cobweb implements the Cobweb (Fisher, 1987) and Classit (Gennari et al., 1989) clustering algorithms. FarthestFirst provides the farthest first traversal algorithm by Hochbaum and Shmoys, which works as a fast simple approximate clusterer modeled after simple $k$-means.

SimpleKMeans provides clustering with the $k$-means algorithm. XMeans provides $k$-means extended by an Improve-Structure part and automatically determines the number of clusters.

DBScan provides the density-based clustering algorithm by Ester, Kriegel, Sander, and Xu. Note that noise points are assigned to NA.


  • A list inheriting from class Weka_clusterers with components including
  • clusterera reference (of class jobjRef) to a Java object obtained by applying the Weka buildClusterer method to the training instances using the given control options.
  • class_idsa vector of integers indicating the class to which each training instance is allocated (the results of calling the Weka clusterInstance method for the built clusterer and each instance).


XMeans requires Weka package XMeans to be installed.

DBScan requires Weka package optics_dbScan to be installed.


M. Ester, H.-P. Kriegel, J. Sander, and X. Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, 226--231. AAAI Press.

D. H. Fisher (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2/2, 139--172.

J. Gennari, P. Langley, and D. H. Fisher (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11--62. D. S. Hochbaum and D. B. Shmoys (1985). A best possible heuristic for the $k$-center problem, Mathematics of Operations Research, 10(2), 180--184.

D. Pelleg and A. W. Moore (2006). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727--734. Morgan Kaufmann.

I. H. Witten and E. Frank (2005). Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco.

  • Cobweb
  • FarthestFirst
  • SimpleKMeans
  • XMeans
  • DBScan
cl1 <- SimpleKMeans(iris[, -5], Weka_control(N = 3))
table(predict(cl1), iris$Species)

## Requires Weka package 'XMeans' to be installed.
## Use XMeans with a KDTree.
cl2 <- XMeans(iris[, -5],
              c("-L", 3, "-H", 7, "-use-kdtree",
                "-K", "weka.core.neighboursearch.KDTree -P"))
table(predict(cl2), iris$Species)
Documentation reproduced from package RWeka, version 0.4-18, License: GPL-2

Community examples

Looks like there are no examples yet.