DAAG (version 1.25.4)

spam7: Spam E-mail Data

Description

The data consist of 4601 email items, of which 1813 items were identified as spam. This is a subset of the full dataset, with six only of the 57 explanatory variables in the complete dataset.

Usage

spam7

Arguments

Format

Columns included are:

crl.tot

total length of uninterrupted sequences of capitals

dollar

Occurrences of the dollar sign, as percent of total number of characters

bang

Occurrences of `!', as percent of total number of characters

money

Occurrences of `money', as percent of total number of words

n000

Occurrences of the string `000', as percent of total number of words

make

Occurrences of `make', as a percent of total number of words

yesno

outcome variable, a factor with levels n not spam, y spam

Examples

Run this code
# NOT RUN {
require(rpart)
spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang +
   money + n000 + make, data=spam7)
plot(spam.rpart)
text(spam.rpart)

# }

Run the code above in your browser using DataLab