gutenberg_download: Download one or more works using a Project Gutenberg ID

Description

Download one or more works by their Project Gutenberg IDs into a data frame with one row per line per work. This can be used to download a single work of interest or multiple at a time. You can look up the Gutenberg IDs of a work using the gutenberg_works() function or the gutenberg_metadata dataset.

Usage

gutenberg_download(gutenberg_id, mirror = NULL, strip = TRUE,
  meta_fields = NULL, verbose = TRUE, ...)

Arguments

gutenberg_id

A vector of Project Gutenberg ID, or a data frame containing a gutenberg_id column, such as from the results of a gutenberg_works() call

mirror

Optionally a mirror URL to retrieve the books from. By default uses the mirror from gutenberg_get_mirror

strip

Whether to strip suspected headers and footers using the gutenberg_strip function

meta_fields

Additional fields, such as title and author, to add from gutenberg_metadata describing each book. This is useful when returning multiple

verbose

Whether to show messages about the Project Gutenberg mirror that was chosen

...

Extra arguments passed to gutenberg_strip, currently unused

Value

A two column tbl_df (a type of data frame; see tibble or dplyr packages) with one row for each line of the text or texts, with columns [object Object],[object Object]

Details

Note that if strip = TRUE, this tries to remove the Gutenberg header and footer using the gutenberg_strip function. This is not an exact process since headers and footers differ between books. Before doing an in-depth analysis you may want to check the start and end of each downloaded book.

Examples

Run this code

library(dplyr)

# download The Count of Monte Cristo
gutenberg_download(1184)

# download two books: Wuthering Heights and Jane Eyre
books <- gutenberg_download(c(768, 1260), meta_fields = "title")
books
books %>% count(title)

# download all books from Jane Austen
austen <- gutenberg_works(author == "Austen, Jane") %>%
  gutenberg_download(meta_fields = "title")

austen
austen %>%
 count(title)

Run the code above in your browser using DataLab