Learn R Programming

idm (version 1.7.1)

enron: enron data set

Description

The data set is a subset of the Enron e-mail corpus from the UCI Machine Learning Repository (Lichman, 2013). The original data is a collection of 39,861 email messages with roughly 6 million tokens and a 28,102 term vocabulary. The subset is a binary (presence/absence) data set containing the 80 most frequent words which appear in the original corpus.

Usage

data("enron")

Arguments

Format

A binary data frame with 39,861 observations (e-mail messages) on 80 variables (words).

References

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Examples

Run this code
data(enron)

Run the code above in your browser using DataLab