pdfx

pdfx_html

pdfx_targz

(character) Path to a file, or files on your machine. Required.

file

(character) One of parsed or text.

what

Further args passed to <code><a rd-options="httr" href="/link/GET?package=fulltext&version=0.1.8&to=httr" data-mini-rdoc="httr::GET">GET</a></code>. These aren't named, so just do e.g. ,
<code>verbose()</code>, or <code>timeout(3)</code>

Output from <code>pdfx</code> function

input

write_path

Uses a web service provided by Utopia at 
<a href="http://pdfx.cs.man.ac.uk/">http://pdfx.cs.man.ac.uk/</a>. Beware, this can be quite slow. 
<code>pdfx</code> posts the pdf from your machine to the web service, 
<code>pdfx_html</code> takes the output of <code>pdfx</code> and gives back 
a html version of extracted text, and <code>pdfx_targz</code> 
gives a tar.gz version of the extracted text. This will not work
with PDFs that are scans of text, or mostly of images.

Provides a single interface to many sources of full text
'scholarly' data, including 'Biomed Central', Public Library of
Science, 'Pubmed Central', 'eLife', 'F1000Research', 'PeerJ',
'Pensoft', 'Hindawi', 'arXiv' 'preprints', and more. Functionality
included for searching for articles, downloading full or partial
text, downloading supplementary materials, converting to various
data formats used in and outside of R.

Scott Chamberlain

fulltext

Full Text of 'Scholarly' Articles Across Many Data Sources

pdfx function

Further args passed to <code><a rd-options='httr' href='GET'>GET</a></code>. These aren't named, so just do e.g. ,
<code>verbose()</code>, or <code>timeout(3)</code>

Uses a web service provided by Utopia at 
<a href='http://pdfx.cs.man.ac.uk/'>http://pdfx.cs.man.ac.uk/</a>. Beware, this can be quite slow. 
<code>pdfx</code> posts the pdf from your machine to the web service, 
<code>pdfx_html</code> takes the output of <code>pdfx</code> and gives back 
a html version of extracted text, and <code>pdfx_targz</code> 
gives a tar.gz version of the extracted text. This will not work
with PDFs that are scans of text, or mostly of images.

pdfx: PDF-to-XML conversion of scientific articles using pdfx

Description

Usage

Arguments

Value

Examples