Learn R Programming

arete (version 0.1)

OCR_document: Scan PDF with optical character recognition (OCR)

Description

Extract text contained under image form in a PDF through the use of optical character recognition software (OCR). Currently two options are available, method = "nougat" and method = "tesseract".

Usage

OCR_document(in_path, out_path, method = "nougat", verbose = TRUE)

Value

character. Containing the extracted information.

Arguments

in_path

character. string of a file with species data in either pdf or txt format, e.g: ./folder/file.pdf

out_path

character. Binomial name of the species used with applicable type.

method

character. Method used for the OCR. Currently it defaults to the only available method, nougatOCR.

verbose

logical. Print output after finish.

Details

For now OCR processing of documents is only supported on linux systems.

See Also

arete_setup

Examples

Run this code
if (FALSE) {
OCR_document("path/to/file.pdf", "path/to/dir")
}

Run the code above in your browser using DataLab