Learn R Programming

pdfboxr (version 2.0.19)

read_text: Read Text from PDF

Description

Read text from a PDF file.

Usage

read_text(
  file,
  pages = integer(),
  password = "",
  max_memory = -1L,
  temp_dir = tempdir()
)

Arguments

file

path to PDF file (is auto-expanded with [path.expand()])

pages

an integer vector giving the pages which should be extracted (default is integer()).

password

a string providing the password of the file.

max_memory

an integer giving the maximum number of main-memory in MB to be used by pdfbox. The default is -1L which means there is no limit. If a limit is set pdfbox will try to stay below by performing out of memory computations. Since the memory of the Java virtual machine is already limited it is recommended to choose the value of max\_memory below the memory limit of the virtual machine (options("java.parameters")). If the memory of the Java virtual machine is big enough this options is never needed.

temp_dir

a character string giving the path to a temporary directory.

Value

Returns a object of class "data.frame".

Examples

Run this code
# NOT RUN {
pdf_file <- system.file("pdfs/cars.pdf", package = "pdfboxr")
pdf <- read_text(pdf_file)
pdf
# }

Run the code above in your browser using DataLab