Soybean: Soybean Database
Description
There are 19 classes, only the first 15 of which have been used in prior
work. The folklore seems to be that the last four classes are
unjustified by the data since they have so few examples.
There are 35 categorical attributes, some nominal and some ordered. The
value ``dna'' means does not apply. The values for attributes are
encoded numerically, with the first value encoded as ``0,'' the second as
``1,'' and so forth.format
A data frame with 683 observations on 36 variables. There are 35
categorical attributes, all numerical and a nominal denoting the
class.
cll{
[,1] Class the 19 classes
[,2] date apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).
[,3] plant.stand normal(0),lt-normal(1).
[,4] precip lt-norm(0),norm(1),gt-norm(2).
[,5] temp lt-norm(0),norm(1),gt-norm(2).
[,6] hail yes(0),no(1).
[,7] crop.hist dif-lst-yr(0),s-l-y(1),s-l-2-y(2),
s-l-7-y(3).
[,8] area.dam scatter(0),low-area(1),upper-ar(2),whole-field(3).
[,9] sever minor(0),pot-severe(1),severe(2).
[,10] seed.tmt none(0),fungicide(1),other(2).
[,11] germ 90-100%(0),80-89%(1),lt-80%(2).
[,12] plant.growth norm(0),abnorm(1).
[,13] leaves norm(0),abnorm(1).
[,14] leaf.halo absent(0),yellow-halos(1),no-yellow-halos(2).
[,15] leaf.marg w-s-marg(0),no-w-s-marg(1),dna(2).
[,16] leaf.size lt-1/8(0),gt-1/8(1),dna(2).
[,17] leaf.shread absent(0),present(1).
[,18] leaf.malf absent(0),present(1).
[,19] leaf.mild absent(0),upper-surf(1),lower-surf(2).
[,20] stem norm(0),abnorm(1).
[,21] lodging yes(0),no(1).
[,22] stem.cankers absent(0),below-soil(1),above-s(2),ab-sec-nde(3).
[,23] canker.lesion dna(0),brown(1),dk-brown-blk(2),tan(3).
[,24] fruiting.bodies absent(0),present(1).
[,25] ext.decay absent(0),firm-and-dry(1),watery(2).
[,26] mycelium absent(0),present(1).
[,27] int.discolor none(0),brown(1),black(2).
[,28] sclerotia absent(0),present(1).
[,29] fruit.pods norm(0),diseased(1),few-present(2),dna(3).
[,30] fruit.spots absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).
[,31] seed norm(0),abnorm(1).
[,32] mold.growth absent(0),present(1).
[,33] seed.discolor absent(0),present(1).
[,34] seed.size norm(0),lt-norm(1).
[,35] shriveling absent(0),present(1).
[,36] roots norm(0),rotted(1),galls-cysts(2).}
source
- Source: R.S. Michalski and R.L. Chilausky "Learning by
Being Told and Learning from Examples: An Experimental
Comparison of the Two Methods of Knowledge Acquisition in the
Context of Developing an Expert System for Soybean Disease
Diagnosis", International Journal of Policy Analysis and
Information Systems, Vol. 4, No. 2, 1980.
- Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlimmer\%cs.cmu.edu)
These data have been taken from the UCI Repository Of Machine Learning
Databases at
- ftp://ftp.ics.uci.edu/pub/machine-learning-databases
- http://www.ics.uci.edu/~mlearn/MLRepository.html
and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.References
Tan, M., & Eshelman, L. (1988). Using weighted networks to represent
classification knowledge in noisy domains. Proceedings of the Fifth
International Conference on Machine Learning (pp. 121-134). Ann Arbor,
Michigan: Morgan Kaufmann.
-- IWN recorded a 97.1% classification accuracy
-- 290 training and 340 test instances
Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and
Predictive Accuracy. Proceedings of the Fifth
International Conference on Machine Learning (pp. 22-28). Ann Arbor,
Michigan: Morgan Kaufmann.
-- Notes why this database is highly predictable