plos_fulltext: Get full text xml of PLOS papers given a DOI

Description

Get full text xml of PLOS papers given a DOI

Usage

plos_fulltext(doi, callopts = list())
## S3 method for class 'plosft':
print(x, ...)

Arguments

doi

One or more DOIs

callopts

Curl options passed on to httr::GET

Input, of class plosft

...

Further args, ignored

Value

Character string of XML.

Examples

Run this code

plos_fulltext(doi='10.1371/journal.pone.0086169')
plos_fulltext(c('10.1371/journal.pone.0086169','10.1371/journal.pbio.1001845'))
dois <- searchplos(q = "*:*", fq='doc_type:full', limit=20)$id
out <- plos_fulltext(dois)
out['10.1371/journal.pone.0013747']
out[1:2]

# Extract text from the XML strings
library("XML")
lapply(out[2:3], function(x){
 tmp <- xmlParse(x)
 xpathApply(tmp, "//abstract", xmlValue)
})

# Make a text corpus
library("tm")
out_parsed <- lapply(out, function(x){
 tmp <- xmlParse(x)
 xpathApply(tmp, "//body", xmlValue)
})
tmcorpus <- Corpus(VectorSource(out_parsed))
(dtm <- DocumentTermMatrix(tmcorpus))
findFreqTerms(dtm, lowfreq = 50)

Run the code above in your browser using DataLab