
Last chance! 50% off unlimited learning
Sale ends in
ft_get
is a one stop shop to fetch full text of articles,
either XML or PDFs. We have specific support for PLOS via the
rplos
package, Entrez via the rentrez
package, and arXiv via the
aRxiv
package. For other publishers, we have helpers to ft_get
to
sort out links for full text based on user input. See Details
for
help on how to use this function.
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
entrezopts = list(), elifeopts = list(), cache = FALSE,
backend = "rds", path = "~/.fulltext", ...)# S3 method for character
ft_get(x, from = NULL, plosopts = list(),
bmcopts = list(), entrezopts = list(), elifeopts = list(),
cache = FALSE, backend = "rds", path = "~/.fulltext", ...)
# S3 method for list
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
entrezopts = list(), elifeopts = list(), cache = FALSE,
backend = "rds", path = "~/.fulltext", ...)
# S3 method for ft
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
entrezopts = list(), elifeopts = list(), cache = FALSE,
backend = "rds", path = "~/.fulltext", ...)
Either identifiers for papers, either DOIs (or other ids) as a list of
charcter strings, or a character vector, OR an object of class ft
, as
returned from ft_search
Source to query. Optional.
PLOS options. See plos_fulltext
BMC options. parameter DEPRECATED
Entrez options. See entrez_search
and
entrez_fetch
eLife options
(logical) To cache results or not. If cache=TRUE
, raw XML, or other
format that article is in is written to disk, then pulled from disk when further
manipulations are done on the data. See also cache
(character) One of rds, rcache, or redis
(character) Path to local folder. If the folder doesn't exist, we create it for you.
Further args passed on to GET
An object of class ft_data
(of type S3
) with slots for
each of the publishers. The returned object is split up by publishers because
the full text format is the same within publisher - which should facilitate
text mining downstream as different steps may be needed for each publisher's
content.
arXiv - The IDs passed are not actually DOIs, though they look similar.
Thus, there's no way to not pass in the from
parameter as we can't
determine unambiguously that the IDs passed in are from arXiv.org.
bmc - is a hot mess since the Springer acquisition. It's been removed as an officially supported plugin, some DOIs from them may still work when passed in here, who knows, it's a mess.
There are various ways to use ft_get
:
Pass in only DOIs - leave from
parameter NULL
. This route will
first query Crossref API for the publisher of the DOI, then we'll use the appropriate
method to fetch full text from the publisher. If a publisher is not found for the DOI,
then we'll throw back a message telling you a publisher was not found.
Pass in DOIs (or other pub IDs) and use the from
parameter. This route
means we don't have to make an extra API call to Crossref (thus, this route is faster)
to determine the publisher for each DOI. We go straight to getting full text based on
the publisher.
Use ft_search
to search for articles. Then pass that output to
this function, which will use info in that object. This behaves the same as the previous
option in that each DOI has publisher info so we know how to get full text for each
DOI.
Note that some publishers are available via Entrez, but often not recent articles, where "recent" may be a few months to a year or so. In that case, make sure to specify the publisher, or else you'll get back no data.
# NOT RUN {
# If you just have DOIs and don't know the publisher
## PLOS
ft_get('10.1371/journal.pone.0086169')
## PeerJ
ft_get('10.7717/peerj.228')
## eLife
ft_get('10.7554/eLife.03032')
## some BMC DOIs will work, but some may not, who knows
ft_get(c('10.1186/2049-2618-2-7', '10.1186/2193-1801-3-7'))
## FrontiersIn
res <- ft_get(c('10.3389/fphar.2014.00109', '10.3389/feart.2015.00009'))
## Hindawi - via Entrez
res <- ft_get(c('10.1155/2014/292109','10.1155/2014/162024','10.1155/2014/249309'))
## F1000Research - via Entrez
ft_get('10.12688/f1000research.6522.1')
## Two different publishers via Entrez - retains publisher names
res <- ft_get(c('10.1155/2014/292109', '10.12688/f1000research.6522.1'))
res$hindawi
res$f1000research
## Pensoft
ft_get('10.3897/zookeys.499.8360')
### you'll need to specify the publisher for a DOI from a recent publication
ft_get('10.3897/zookeys.515.9332', from = "pensoft")
## Copernicus
out <- ft_get(c('10.5194/angeo-31-2157-2013', '10.5194/bg-12-4577-2015'))
out$copernicus
## arXiv - only pdf, you have to pass in the from parameter
res <- ft_get(x='cond-mat/9309029', from = "arxiv", cache=TRUE, backend="rds")
res %>% ft_extract
## bioRxiv - only pdf
res <- ft_get(x='10.1101/012476')
res$biorxiv
## Karger Publisher
ft_get('10.1159/000369331')
## CogentOA Publisher
ft_get('10.1080/23311916.2014.938430')
## MDPI Publisher
ft_get('10.3390/nu3010063')
ft_get('10.3390/nu7085279')
ft_get(c('10.3390/nu3010063', '10.3390/nu7085279')) # not working, only getting 1
# If you know the publisher, give DOI and publisher
## by default, PLOS gives back XML
ft_get('10.1371/journal.pone.0086169', from='plos')
## you can instead get json
ft_get('10.1371/journal.pone.0086169', from='plos', plosopts=list(wt="json"))
(dois <- searchplos(q="*:*", fl='id',
fq=list('doc_type:full',"article_type:\"research article\""), limit=5)$data$id)
ft_get(dois, from='plos')
ft_get(c('10.7717/peerj.228','10.7717/peerj.234'), from='entrez')
# elife
ft_get('10.7554/eLife.04300', from='elife')
ft_get(c('10.7554/eLife.04300', '10.7554/eLife.03032'), from='elife')
## search for elife papers via Entrez
dois <- ft_search("elife[journal]", from = "entrez")
ft_get(dois)
# Frontiers in Pharmacology (publisher: Frontiers)
doi <- '10.3389/fphar.2014.00109'
ft_get(doi, from="entrez")
# Hindawi Journals
ft_get(c('10.1155/2014/292109','10.1155/2014/162024','10.1155/2014/249309'), from='entrez')
res <- ft_search(query='ecology', from='crossref', limit=50,
crossrefopts = list(filter=list(has_full_text = TRUE,
member=98,
type='journal-article')))
out <- ft_get(res$crossref$data$DOI[1:20], from='entrez')
# Frontiers Publisher - Frontiers in Aging Nueroscience
res <- ft_get("10.3389/fnagi.2014.00130", from='entrez')
res$entrez
# Search entrez, get some DOIs
(res <- ft_search(query='ecology', from='entrez'))
res$entrez$data$doi
ft_get(res$entrez$data$doi[1], from='entrez')
ft_get(res$entrez$data$doi[1:3], from='entrez')
# Caching
res <- ft_get('10.1371/journal.pone.0086169', from='plos', cache=TRUE, backend="rds")
# Search entrez, and pass to ft_get()
(res <- ft_search(query='ecology', from='entrez'))
ft_get(res)
# }
Run the code above in your browser using DataLab