Return a function which reads in a Microsoft Word document extracting its text.
readDOC(engine = c("antiword", "executable"), AntiwordOptions = "")
a character string for the preferred DOC extraction engine (see Details).
Options passed over to
function with the following formals:
a list with the named component
uri which must
hold a valid file name.
a string giving the language.
The function returns a
PlainTextDocument representing the text
and metadata extracted from
Formally this function is a function generator, i.e., it returns a
function (which reads in a text document) with a well-defined
signature, but can access passed over arguments (e.g., options to
antiword) via lexical scoping.
Available DOC extraction engines are as follows.
(default) Antiword utility as provided by the
antiword in package antiword.
antiword executable which
must be installed and accessible on your system. This can convert
documents from Microsoft Word version 2, 6, 7, 97, 2000, 2002 and 2003 to
plain text, and is available from http://www.winfield.demon.nl/. The
AntiwordOptions is passed over to the executable.
Reader for basic information on the reader infrastructure
employed by package tm.