igraphdata (version 1.0.1)

enron: Enron Email Network

Description

An Enron email dataset has been made public by the U.S. Department of Justice.

Usage

enron

Arguments

Format

A directed igraph graph object. Graph attributes:
  • ‘LDC_names’ The names of the 32 LDC catagories the emails are classfied into by Michael W. Berry (http://www.cis.jhu.edu/~parky/Enron/Anno_Topic_exp_LDC.pdf)
  • ‘LDC_desc’ Longer descriptions of the 32 LDC categories.
  • ‘Citation’ Additionally, see also the references below.
  • ‘name’
Vertex attributes:
  • ‘Email’ Email address.
  • ‘Name’ Real name.
  • ‘Note’ E.g. position at Enron.
Edge attributes:
  • ‘Time’ When the email was sent. Note that some time labels are from 1979, these are certainly wrong and you might want to remove them before analyses that include time.
  • ‘Reciptype’ Recipient type, ‘to’, ‘cc’ or ‘bcc’.
  • ‘Topic’ Assigned based on 3-means clustering of randomly selected 3,120 out of all 125,409 messages, then NN classification for the whole corpus. Note that topic 0 means an outlier, e.g., too few words or all meaningless numbers in the message body.
  • ‘LDC_topic’ Assigned based on Michael W. Berry's 2001 “Annotated (by Topic) Enron Email Data Set.” (http://www.cis.jhu.edu/~parky/Enron/Anno_Topic_exp_LDC.pdf) There are 32 topics. Topic "0" means an outlier, e.g., too few words or all meaningless numbers in the message body, etc. Topic "-1" means there is no matching topic.

References

C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park, Scan Statistics on Enron Graphs Computational and Mathematical Organization Theory, Volume 11, Number 3, p229 - 247, October 2005, Springer Science+Business Media B.V. C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park, Scan Statistics on Enron Graphs, SIAM International Conference on Data Mining, Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, California, April 23, 2005. Gina Kolata, Enron Offers an Unlikely Boost to E-Mail Surveillance, New York Times, Week in Review, May 22, 2005. C.E. Priebe, Scan Statistics on Enron Graphs, IPAM Summer Graduate School: Intelligent Extraction of Information from Graphs and High Dimensional Data, UCLA, July 11-29, 2005. C.E. Priebe, Scan Statistics on Enron Graphs, 2005 Fall Department of Applied Mathematics and Statistics Seminars, September 15, 2005, The Johns Hopkins University.

Y. Park, C.E. Priebe, D.J. Marchette, Scan Statistics on Enron Hypergraphs, Interface 2008, Durham, North Carolina, May 21, 2008, Y. Park, C.E. Priebe, D.J. Marchette, Anomaly Detection using Scan Statistics on Enron Graphs and Hypergraphs, The Satellite Workshop of the IASC 2008 Conference, Seoul, Korea, December 1-3, 2008.

Y. Park, C.E. Priebe, D.J. Marchette, A. Youssef, Anomaly Detection using Scan Statistics on Time Series of Hypergraphs, Workshop on Link Analysis, Counterterrorism and Security at the SIAM International Conference on Data Mining, Sparks, Nevada, May 1-3, 2009,

Y. Park, C.E. Priebe, A. Youssef, Anomaly Detection in Time Series of Graphs using Fusion of Invariants, Computational and Mathematical Organization Theory, submitted, 2010.