Learn R Programming

rodham (version 0.1.1)

get_emails: Get emails and its contents

Description

Get the content of Hillary Rodham Clinton's emails by release.

Usage

get_emails(release, save.dir = getwd(), extractor, ...)

Arguments

release

Name of the batch of release of emails; see details.

save.dir

Directory where to save the extracted text defaults to getwd()

extractor

Full path to pdf extractor pdftotext, see details.

...

additional parameters to pass to pdftotext.

Value

Fetches email zip file from the WSJ and extract text files in save.dir, returns full path to directory that contains parsed txt files.

Details

Below are the valid values for release; follows the WSJ naming convention.

  • Benghazi

  • June

  • July

  • August

  • September

  • October

  • November

  • January 7

  • January 29

  • February 19

  • february 29

  • December

  • Non-disclosure

The extractor argument is the full path to your pdftotext.exe extractor; visit xpdf to download or try get_xpdf which attempts to download and unzip the text to pdf extractor. See examples.

See Also

get_xpdf, download_emails, extract_emails

Examples

Run this code
# NOT RUN {
# get xpdf extractor
ext <- get_xpdf()

# create
dir.create("emails")

# get emails released in august
emails_aug <- get_emails(release = "August", save.dir = "./emails",
                     extractor = ext)

# use manually downloaded extractor
# ext <- "C:/xpdfbin-win-3.04/bin64/pdftotext.exe"

# get emails related to Benghazi released in December
emails_bengh <- get_emails(release = "Benghazi", extractor = ext,
                           save.dir = "./emails")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab