Learn R Programming

fulltext (version 1.0.1)

ft_chunks: Extract chunks of data from articles

Description

ft_chunks makes it easy to extract sections of an article. You can extract just authors across all articles, or all references sections, or the complete text of each article. Then you can pass the output downstream for visualization and analysis.

Usage

ft_chunks(x, what = "all")

ft_tabularize(x)

Arguments

x

An object of class ft_data, the output from a call to ft_get()

what

What to get, can be one or more in a vector or list. See Details.

Value

A list of output, one for each thing requested

Details

Options for the what parameter:

  • front - Publisher, journal and article metadata elements

  • body - Body of the article

  • back - Back of the article, acknowledgments, author contributions, references

  • title - Article title

  • doi - Article DOI

  • categories - Publisher's categories, if any

  • authors - Authors

  • keywords - Keywords

  • abstract - Article abstract

  • executive_summary - Article executive summary

  • refs - References

  • refs_dois - References DOIs - if available

  • publisher - Publisher name

  • journal_meta - Journal metadata

  • article_meta - Article metadata

  • acknowledgments - Acknowledgments

  • permissions - Article permissions

  • history - Dates, recieved, published, accepted, etc.

Note that we currently only support PLOS, eLife, Entrez, and Elsevier right now; more to come.

Examples

Run this code
# NOT RUN {
x <- ft_get('10.1371/journal.pone.0086169', from='plos')
x %>% ft_collect %>% ft_chunks(what="authors")

library("rplos")
(dois <- searchplos(q="*:*", fl='id',
   fq=list('doc_type:full',"article_type:\"research article\""),
     limit=5)$data$id)
x <- ft_get(dois, from="plos")
x %>% ft_chunks("front")
x %>% ft_chunks("body")
x %>% ft_chunks("back")
x %>% ft_chunks("history")
x %>% ft_chunks(c("doi","history")) %>% ft_tabularize()
x %>% ft_chunks("authors")
x %>% ft_chunks(c("doi","categories"))
x %>% ft_chunks("all")
x %>% ft_chunks("publisher")
x %>% ft_chunks("acknowledgments")
x %>% ft_chunks("permissions")
x %>% ft_chunks("journal_meta")
x %>% ft_chunks("article_meta")

# Coerce list output to a data.frame, where possible
dois <- c('10.7554/elife.28589', '10.7554/elife.14009', '10.7554/elife.13941', 
  '10.7554/elife.22170', '10.7554/elife.29285')
x <- ft_get(dois) 
x <- x %>% ft_collect()
x$elife
x %>% ft_chunks("publisher") %>% ft_tabularize()
x %>% ft_chunks("refs") %>% ft_tabularize()
x %>% ft_chunks(c("doi","publisher")) %>% ft_tabularize()
x %>% ft_chunks(c("doi","publisher","permissions")) %>% ft_tabularize()

x <- ft_get(c("10.3389/fnagi.2014.00130",'10.1155/2014/249309',
  '10.1155/2014/162024'), from='entrez')
x <- x %>% ft_collect()
x %>% ft_chunks("doi") %>% ft_tabularize()
x %>% ft_chunks("authors") %>% ft_tabularize()
x %>% ft_chunks(c("doi","publisher","permissions")) %>% ft_tabularize()
x %>% ft_chunks("history") %>% ft_tabularize()

x <- ft_get('10.3389/fnagi.2014.00130', from='entrez')
x <- x %>% ft_collect()
x %>% ft_chunks("keywords")

# Piping workflow
opts <- list(fq=list('doc_type:full',"article_type:\"research article\""))
ft_search(query='ecology', from='plos', plosopts = opts)$plos$data$id %>%
 ft_get(from = "plos") %>%
 ft_chunks("publisher")

# Via entrez
res <- ft_get(c("10.3389/fnagi.2014.00130",'10.1155/2014/249309',
   '10.1155/2014/162024'), from='entrez')
res <- res %>% ft_collect()
ft_chunks(res, what="abstract")
ft_chunks(res, what="title")
ft_chunks(res, what="keywords")
ft_chunks(res, what="publisher")

(res <- ft_search(query='ecology', from='entrez'))
ft_get(res$entrez$data$doi, from='entrez') %>% ft_collect() %>% ft_chunks("title")
ft_get(res$entrez$data$doi[1:4], from='entrez') %>%
 ft_collect() %>% 
 ft_chunks("acknowledgments")
ft_get(res$entrez$data$doi[1:4], from='entrez') %>%
 ft_collect() %>% 
 ft_chunks(c('title','keywords'))

# From eLife
x <- ft_get(c('10.7554/eLife.04251', '10.7554/eLife.04986'), from='elife')
x %>% ft_chunks("abstract")
x %>% ft_chunks("publisher")
x %>% ft_chunks("journal_meta")
x %>% ft_chunks("acknowledgments")
x %>% ft_chunks("refs_dois")
x %>% ft_chunks(c("abstract", "executive_summary"))
# }

Run the code above in your browser using DataLab