tidy.Corpus
From tidytext v0.1.3
by Julia Silge
Tidy a Corpus object from the tm package
Tidy a Corpus object from the tm package. Returns a data frame
with one-row-per-document, with a text
column containing
the document's text, and one column for each local (per-document)
metadata tag. For corpus objects from the quanteda package,
see tidy.corpus
.
Usage
# S3 method for Corpus
tidy(x, collapse = "\n", ...)
Arguments
- x
A Corpus object, such as a VCorpus or PCorpus
- collapse
A string that should be used to collapse text within each corpus (if a document has multiple lines). Give NULL to not collapse strings, in which case a corpus will end up as a list column if there are multi-line documents.
- ...
Extra arguments, not used
Examples
# NOT RUN {
library(dplyr) # displaying tbl_dfs
if (requireNamespace("tm", quietly = TRUE)) {
library(tm)
#' # tm package examples
txt <- system.file("texts", "txt", package = "tm")
ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"),
readerControl = list(language = "lat"))
ovid
tidy(ovid)
# choose different options for collapsing text within each
# document
tidy(ovid, collapse = "")$text
tidy(ovid, collapse = NULL)$text
# another example from Reuters articles
reut21578 <- system.file("texts", "crude", package = "tm")
reuters <- VCorpus(DirSource(reut21578),
readerControl = list(reader = readReut21578XMLasPlain))
reuters
tidy(reuters)
}
# }
Community examples
Looks like there are no examples yet.