Readers are functions for extracting textual content and metadata out
of elements delivered by a
Source, and for constructing a
TextDocument. A reader must accept following arguments in
- a named list with the components
uri(as delivered by a
- a character string giving the language.
- a character giving a unique identifier for the created text document.
elem is typically provided by a source whereas the language
and the identifier are normally provided by a corpus constructor (for the case
elem$content does not give information on these two essential
In case a reader expects configuration arguments we can use a function
generator. A function generator is indicated by inheriting from class
function. It allows us to process
additional arguments, store them in an environment, return a reader function
with the well-defined signature described above, and still be able to access
the additional arguments via lexical scoping. All corpus constructors in
package tm check the reader function for being a function generator and
if so apply it to yield the reader with the expected signature.
getReaders(), a character vector with readers provided by package tm.