readTabular(mapping)
content
to access the document's
content, and character strings which are mapped to metadata entries.function
with the following formals:
elem
content
which must
hold the document to be read in.language
id
PlainTextDocument
representing the text
and metadata extracted from elem$content
. The arguments language
and id
are used as fallback if no corresponding metadata entries are
found in elem$content
.
Reader
for basic information on the reader infrastructure
employed by package tm.Vignette 'Extensions: How to Handle Custom File Formats'.
df <- data.frame(contents = c("content 1", "content 2", "content 3"), title = c("title 1" , "title 2" , "title 3" ), authors = c("author 1" , "author 2" , "author 3" ), topics = c("topic 1" , "topic 2" , "topic 3" ), stringsAsFactors = FALSE) m <- list(content = "contents", heading = "title", author = "authors", topic = "topics") myReader <- readTabular(mapping = m) ds <- DataframeSource(df) elem <- getElem(stepNext(ds)) (result <- myReader(elem, language = "en", id = "id1")) meta(result)