openintro (version 1.7.1)

email50: Sample of 50 emails

Description

This is a subsample of the email data set.

Usage

data(email50)

Arguments

Format

A data frame with 50 observations on the following 21 variables.

spam

Indicator for whether the email was spam.

to_multiple

Indicator for whether the email was addressed to more than one recipient.

from

Whether the message was listed as from anyone (this is usually set by default for regular outgoing email).

cc

Indicator for whether anyone was CCed.

sent_email

Indicator for whether the sender had been sent an email in the last 30 days.

time

Time at which email was sent.

image

The number of images attached.

attach

The number of attached files.

dollar

The number of times a dollar sign or the word “dollar” appeared in the email.

winner

Indicates whether “winner” appeared in the email.

inherit

The number of times “inherit” (or an extension, such as “inheritance”) appeared in the email.

viagra

The number of times “viagra” appeared in the email.

password

The number of times “password” appeared in the email.

num_char

The number of characters in the email, in thousands.

line_breaks

The number of line breaks in the email (does not count text wrapping).

format

Indicates whether the email was written using HTML (e.g. may have included bolding or active links).

re_subj

Whether the subject started with “Re:”, “RE:”, “re:”, or “rE:”

exclaim_subj

Whether there was an exclamation point in the subject.

urgent_subj

Whether the word “urgent” was in the email subject.

exclaim_mess

The number of exclamation points in the email message.

% \item{\code{period_mess}}{The number of periods in the message.} % \item{\code{signoff}}{Whether a sign-off of \dQuote{Cheers}, \dQuote{Regards}, or \dQuote{Best} (also, \dQuote{Best Regards}) was used.}
number

Factor variable saying whether there was no number, a small number (under 1 million), or a big number.

References

~~ OpenIntro Statistics, openintro.org ~~

See Also

email, county

Examples

Run this code
# NOT RUN {
data(email50)
data(email)
set.seed(5)
d  <- email[sample(nrow(email), 50),][c(1:25,27:50,26),]
identical(d, email50)

# the "[c(1,26,2:25,27:50),]" was added to reorder the cases
# }

Run the code above in your browser using DataCamp Workspace