A data frame with 50 observations on the following 21 variables.
- spam
Indicator for whether the email was spam.
- to_multiple
Indicator for whether the email was addressed to more
than one recipient.
- from
Whether the message was listed as from anyone (this is usually
set by default for regular outgoing email).
- cc
Indicator for whether anyone was CCed.
- sent_email
Indicator for whether the sender had been sent an email
in the last 30 days.
- time
Time at which email was sent.
- image
The number of images attached.
- attach
The number of attached files.
- dollar
The number of times a dollar sign or the word “dollar”
appeared in the email.
- winner
Indicates whether “winner” appeared in the email.
- inherit
The number of times “inherit” (or an extension, such
as “inheritance”) appeared in the email.
- viagra
The number of times “viagra” appeared in the email.
- password
The number of times “password” appeared in the email.
- num_char
The number of characters in the email, in thousands.
- line_breaks
The number of line breaks in the email (does not count
text wrapping).
- format
Indicates whether the email was written using HTML (e.g. may
have included bolding or active links).
- re_subj
Whether the subject started with “Re:”, “RE:”,
“re:”, or “rE:”
- exclaim_subj
Whether there was an exclamation point in the subject.
- urgent_subj
Whether the word “urgent” was in the email subject.
- exclaim_mess
The number of exclamation points in the email message.
- number
Factor variable saying whether there was no number, a small
number (under 1 million), or a big number.