The spambase
data has 57 real valued explanatory variables which
characterize the contents of an email and and one binary response
variable indicating if the email is spam. There are 4601 observations.
Of the 57 explanatory variables, 48 describe word frequency, 6 describe character frequency, and 3 describe sequences of capital letters.
Use names
to see the specific words, characters, or statistics for
each respective class of variable.
Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.