Learn R Programming

BayesLogit (version 0.5.1)

spambase: Spambase Data

Description

The spambase data has 57 real valued explanatory variables which characterize the contents of an email and and one binary response variable indicating if the email is spam. There are 4601 observations.

Arguments

Format

A data frame: the first column is a binary response variable indicating if the email is spam. The remaining 57 columns are real valued explanatory variables.

Details

Of the 57 explanatory variables, 48 describe word frequency, 6 describe character frequency, and 3 describe sequences of capital letters.

word.freq.
A continuous explanatory variable describing the frequency with which the word appears; measured in percent.

char.freq.
A continuous explanatory variable describing the frequency with which the character appears; measured in percent.

capital.run.length.
A statistic involving the length of consecutive capital letters.

Use names to see the specific words, characters, or statistics for each respective class of variable.

References

Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt of Hewlett-Packard Labs (1999). Spambase Data Set. http://archive.ics.uci.edu/ml/datasets/Spambase

Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.