Learn R Programming

orderanalyzer (version 1.0.0)

extractText: Extracts the text from a PDF file

Description

This function extracts text from PDF documents and returns the text as a string, as a list of lines and as a list of words. It uses 'pdftools' to extract the content from textual PDF files and 'tesseract' to extract the content from image-based PDF-files.

Usage

extractText(file)

Value

List including the extracted text, a data table including the lines, a data table including the words, the type and language of the document.

Arguments

file

Path to the PDF file

Examples

Run this code
file <- system.file("extdata", "OrderDocument_en.pdf", package = "orderanalyzer")
text <- extractText(file)
text$words

Run the code above in your browser using DataLab