Learn R Programming

fulltext (version 0.1.6)

ft_extract_corpus: Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Description

Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Usage

ft_extract_corpus(paths, which = "xpdf", ...)

Arguments

paths
Path to one or more pdfs
which
One of gs or xpdf.
...
further args passed on to readerControl parameter in Corpus

Value

A tm Corpus (or VCorpus, later that is)

See Also

ft_extract

Examples

Run this code
## Not run: 
# path <- system.file("examples", "example1.pdf", package = "fulltext")
# (res <- ft_extract_corpus(path, "xpdf"))
# tm::TermDocumentMatrix(res$data)
# 
# (res_gs <- ft_extract_corpus(path, "gs"))
# ## End(Not run)

Run the code above in your browser using DataLab