R interfaces to Weka clustering algorithms.

```
Cobweb(x, control = NULL)
FarthestFirst(x, control = NULL)
SimpleKMeans(x, control = NULL)
XMeans(x, control = NULL)
DBScan(x, control = NULL)
```

x

an R object with the data to be clustered.

control

an object of class `Weka_control`

, or a
character vector of control options, or `NULL`

(default).
Available options can be obtained on-line using the Weka Option
Wizard `WOW`

, or the Weka documentation.

A list inheriting from class `Weka_clusterers`

with components
including

a reference (of class
`jobjRef`

) to a Java object
obtained by applying the Weka `buildClusterer`

method to the
training instances using the given control options.

a vector of integers indicating the class to which
each training instance is allocated (the results of calling the Weka
`clusterInstance`

method for the built clusterer and each
instance).

There is a `predict`

method for
predicting class ids or memberships from the fitted clusterers.

`Cobweb`

implements the Cobweb (Fisher, 1987) and Classit
(Gennari et al., 1989) clustering algorithms.

`FarthestFirst`

provides the “farthest first traversal
algorithm” by Hochbaum and Shmoys, which works as a fast simple
approximate clusterer modeled after simple \(k\)-means.

`SimpleKMeans`

provides clustering with the \(k\)-means
algorithm.

`XMeans`

provides \(k\)-means extended by an
“Improve-Structure part” and automatically determines the
number of clusters.

`DBScan`

provides the “density-based clustering algorithm”
by Ester, Kriegel, Sander, and Xu. Note that noise points are assigned
to `NA`

.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu (1996).
A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise.
*Proceedings of the Second International Conference on Knowledge
Discovery and Data Mining (KDD'96)*,
Portland, OR, 226--231.
AAAI Press.

D. H. Fisher (1987).
Knowledge acquisition via incremental conceptual clustering.
*Machine Learning*, **2**/2, 139--172.
10.1023/A:1022852608280.

J. Gennari, P. Langley, and D. H. Fisher (1989).
Models of incremental concept formation.
*Artificial Intelligence*, **40**, 11--62.

D. S. Hochbaum and D. B. Shmoys (1985).
A best possible heuristic for the \(k\)-center problem,
*Mathematics of Operations Research*, **10**(2), 180--184.
10.1287/moor.10.2.180.

D. Pelleg and A. W. Moore (2006).
X-means: Extending K-means with Efficient Estimation of the Number of
Clusters.
In: *Seventeenth International Conference on Machine Learning*,
727--734.
Morgan Kaufmann.

I. H. Witten and E. Frank (2005).
*Data Mining: Practical Machine Learning Tools and Techniques*.
2nd Edition, Morgan Kaufmann, San Francisco.

# NOT RUN { cl1 <- SimpleKMeans(iris[, -5], Weka_control(N = 3)) cl1 table(predict(cl1), iris$Species) # } # NOT RUN { ## Requires Weka package 'XMeans' to be installed. ## Use XMeans with a KDTree. cl2 <- XMeans(iris[, -5], c("-L", 3, "-H", 7, "-use-kdtree", "-K", "weka.core.neighboursearch.KDTree -P")) cl2 table(predict(cl2), iris$Species) # }