Learn R Programming

pdfboxr (version 2.0.19)

read_chars: Read Characters from PDF

Description

Extract all the characters of a PDF file with information about the position, font, rotation, ....

Usage

read_chars(
  file,
  pages = integer(),
  adjust = TRUE,
  size_pt = FALSE,
  digits = 2L,
  password = "",
  max_memory = -1L,
  temp_dir = tempdir()
)

Arguments

file

path to PDF file (is auto-expanded with [path.expand()])

pages

an integer vector giving the pages which should be extracted (default is integer()).

adjust

a logical if TRUE the variables "x0", "y0", "height" and "width" are direction adjusted.

size_pt

a logical giving if the font size should be returned in pt.

digits

an integer (of length 1) giving to how many digits the double variables (e.g. "x0", "y1") should be rounded (the default is 2L and this should be enough).

password

a string providing the password of the file.

max_memory

an integer giving the maximum number of main-memory in MB to be used by pdfbox. The default is -1L which means there is no limit. If a limit is set pdfbox will try to stay below by performing out of memory computations. Since the memory of the Java virtual machine is already limited it is recommended to choose the value of max\_memory below the memory limit of the virtual machine (options("java.parameters")). If the memory of the Java virtual machine is big enough this options is never needed.

temp_dir

a character string giving the path to a temporary directory.

Value

Returns a object of class "pdf_document".

Examples

Run this code
# NOT RUN {
pdf_file <- system.file("pdfs/cars.pdf", package = "pdfboxr")
pdf <- read_chars(pdf_file, 2L)
pdf
# }

Run the code above in your browser using DataLab