
Last chance! 50% off unlimited learning
Sale ends in
Helper function to download training data from the official tessdata repository. On Linux, the fast training data can be installed directly with yum or apt-get.
Helper function to download training data from the contributed tessdata_contrib repository.
tesseract_download(
lang,
model = c("fast", "best"),
datapath = NULL,
progress = interactive()
)tesseract_contributed_download(
lang,
model = c("fast", "best"),
datapath = NULL,
progress = interactive()
)
no return value, called for side effects
no return value, called for side effects
three letter code for language, see tessdata repository.
either fast
or best
is currently supported. The latter
downloads more accurate (but slower) trained models for Tesseract 4.0 or
higher
destination directory where to download store the file
print progress while downloading
Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR performance for other languages you can to install the training data from your distribution. For example to install the spanish training data:
tesseract-ocr-spa (Debian, Ubuntu)
tesseract-langpack-spa (Fedora, EPEL)
On Windows and MacOS you can install languages using the tesseract_download
function which downloads training data directly from
github
and stores it in a the path on disk given by the TESSDATA_PREFIX
variable.
tesseract_download
Other tesseract:
ocr()
,
tesseract()
Other tesseract:
ocr()
,
tesseract()
# download the french training data
# this is wrapped around a \donttest{} block because otherwise the clang19
# CRAN check will fail with a "> 5 seconds" message
# \donttest{
dir <- tempdir()
tesseract_download("fra", model = "best", datapath = dir)
file <- system.file("examples", "french.png", package = "cpp11tesseract")
text <- ocr(file, engine = tesseract("fra", datapath = dir))
cat(text)
# }
# download the greek training data
# this is wrapped around a \donttest{} block because otherwise the clang19
# CRAN check will fail with a "> 5 seconds" message
# \donttest{
dir <- tempdir()
tesseract_contributed_download("grc_hist", model = "best", datapath = dir)
file <- system.file("examples", "polytonicgreek.png",
package = "cpp11tesseract")
text <- ocr(file, engine = tesseract("grc_hist", datapath = dir))
cat(text)
# }
Run the code above in your browser using DataLab