Learn R Programming

orderanalyzer (version 1.0.0)

Extracting Order Position Tables from PDF-Based Order Documents

Description

Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.

Copy Link

Version

Install

install.packages('orderanalyzer')

Monthly Downloads

120

Version

1.0.0

License

GPL-3

Maintainer

Michael Scholz

Last Published

December 12th, 2024

Functions in orderanalyzer (1.0.0)

identifyLanguage

Identifies the language of a given text based on frequent trigrams
extractTables

Extract tables from a given words-dataframe
orderanalyzer-package

Extracting order position tables from PDF-based order documents
extractText

Extracts the text from a PDF file