Extract text from a file
extract_text(file, pages = NULL, area = NULL, password = NULL,
encoding = NULL, copy = FALSE)
A character string specifying the path or URL to a PDF file.
An optional integer vector specifying pages to extract from.
An optional list, of length equal to the number of pages specified, where each entry contains a four-element numeric vector of coordinates (top,left,bottom,right) containing the table for the corresponding page. As a convenience, a list of length 1 can be used to extract the same area from all (specified) pages.
Optionally, a character string containing a user password to access a secured PDF.
Optionally, a character string specifying an encoding for the text, to be passed to the assignment method of Encoding
.
Specifies whether the original local file(s) should be copied to
tempdir()
before processing. FALSE
by default. The argument is
ignored if file
is URL.
If pages = NULL
(the default), a length 1 character vector, otherwise a vector of length length(pages)
.
This function converts the contents of a PDF file into a single unstructured character string.
# NOT RUN { # simple demo file f <- system.file("examples", "text.pdf", package = "tabulizer") # extract all text extract_text(f) # extract all text from page 1 only extract_text(f, pages = 1) # extract text from selected area only extract_text(f, area = list(c(209.4, 140.5, 304.2, 500.8))) # }