soybean

<p>There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value 'dna' means does not apply. The values for attributes are encoded numerically, with the first value encoded as '0', the second as '1', and so forth. Unknown values were imputated using the mice package.</p>

datasets

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

Lampros Mouselimis

ClusterR

Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids
and Affinity Propagation Clustering

Conrad Sanderson

Ryan Curtin

Siddharth Agrawal

Brendan Frey

Delbert Dueck

soybean function

<p>A data frame with 307 Instances and 36 attributes (including the class attribute, "class")</p>

Format

The soybean (large) data set from the UCI repository — soybean

The soybean (large) data set from the UCI repository

soybean: The soybean (large) data set from the UCI repository

Description

Usage

Arguments

Format

Details

References

Examples