Learn R Programming

fulltext (version 0.1.8)

ft_extract_corpus: Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Description

Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Usage

ft_extract_corpus(paths, which = "xpdf", ...)

Arguments

paths

Path to one or more pdfs

which

One of gs or xpdf.

...

further args passed on to readerControl parameter in Corpus

Value

A tm Corpus (or VCorpus, later that is)

See Also

ft_extract

Examples

Run this code
# NOT RUN {
path <- system.file("examples", "example1.pdf", package = "fulltext")
(res <- ft_extract_corpus(path, "xpdf"))
tm::TermDocumentMatrix(res$data)

(res_gs <- ft_extract_corpus(path, "gs"))
# }

Run the code above in your browser using DataLab