Soybean Database

There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value ``dna'' means does not apply. The values for attributes are encoded numerically, with the first value encoded as ``0,'' the second as ``1,'' and so forth.


A data frame with 683 observations on 36 variables. There are 35 categorical attributes, all numerical and a nominal denoting the class.

[,1] Class
the 19 classes [,2]
date apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).
[,3] plant.stand
normal(0),lt-normal(1). [,4]
precip lt-norm(0),norm(1),gt-norm(2).
[,5] temp
lt-norm(0),norm(1),gt-norm(2). [,6]
hail yes(0),no(1).
[,7] crop.hist
dif-lst-yr(0),s-l-y(1),s-l-2-y(2), s-l-7-y(3). [,8]
area.dam scatter(0),low-area(1),upper-ar(2),whole-field(3).
[,9] sever
minor(0),pot-severe(1),severe(2). [,10]
seed.tmt none(0),fungicide(1),other(2).
[,11] germ
90-100%(0),80-89%(1),lt-80%(2). [,12]
plant.growth norm(0),abnorm(1).
[,13] leaves
norm(0),abnorm(1). [,14]
leaf.halo absent(0),yellow-halos(1),no-yellow-halos(2).
[,15] leaf.marg
w-s-marg(0),no-w-s-marg(1),dna(2). [,16]
leaf.size lt-1/8(0),gt-1/8(1),dna(2).
[,17] leaf.shread
absent(0),present(1). [,18]
leaf.malf absent(0),present(1).
[,19] leaf.mild
absent(0),upper-surf(1),lower-surf(2). [,20]
stem norm(0),abnorm(1).
[,21] lodging
yes(0),no(1). [,22]
stem.cankers absent(0),below-soil(1),above-s(2),ab-sec-nde(3).
[,23] canker.lesion
dna(0),brown(1),dk-brown-blk(2),tan(3). [,24]
fruiting.bodies absent(0),present(1).
[,25] ext.decay
absent(0),firm-and-dry(1),watery(2). [,26]
mycelium absent(0),present(1).
[,27] int.discolor
none(0),brown(1),black(2). [,28]
sclerotia absent(0),present(1).
[,29] fruit.pods
norm(0),diseased(1),few-present(2),dna(3). [,30]
fruit.spots absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).
[,31] seed
norm(0),abnorm(1). [,32]
mold.growth absent(0),present(1).
[,33] seed.discolor
absent(0),present(1). [,34]
seed.size norm(0),lt-norm(1).
[,35] shriveling
absent(0),present(1). [,1]


  • Source: R.S. Michalski and R.L. Chilausky "Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis", International Journal of Policy Analysis and Information Systems, Vol. 4, No. 2, 1980.
  • Donor: Ming Tan & Jeff Schlimmer (
These data have been taken from the UCI Repository Of Machine Learning Databases at and were converted to R format by Evgenia Dimitriadou.


Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning (pp. 121-134). Ann Arbor, Michigan: Morgan Kaufmann. -- IWN recorded a 97.1% classification accuracy -- 290 training and 340 test instances Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and Predictive Accuracy. Proceedings of the Fifth International Conference on Machine Learning (pp. 22-28). Ann Arbor, Michigan: Morgan Kaufmann. -- Notes why this database is highly predictable

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases []. Irvine, CA: University of California, Department of Information and Computer Science.

  • Soybean
library(mlbench) data(Soybean) summary(Soybean)
Documentation reproduced from package mlbench, version 2.1-1, License: GPL-2

Community examples

Looks like there are no examples yet.