Last chance! 50% off unlimited learning
Sale ends in
Interface to apply transformation functions (also denoted as mappings) to corpora.
# S3 method for PCorpus
tm_map(x, FUN, ...)
# S3 method for SimpleCorpus
tm_map(x, FUN, ...)
# S3 method for VCorpus
tm_map(x, FUN, ..., lazy = FALSE)
A corpus with FUN
applied to each document in x
. In case
of lazy mappings only internal flags are set. Access of individual documents
triggers the execution of the corresponding transformation function.
A corpus.
a transformation function taking a text document (a character
vector when x
is a SimpleCorpus
) as input and returning a text
document (a character vector of the same length as the input vector for
SimpleCorpus
). The function content_transformer
can be
used to create a wrapper to get and set the content of text documents.
arguments to FUN
.
a logical. Lazy mappings are mappings which are delayed until the content is accessed. It is useful for large corpora if only few documents will be accessed. In such a case it avoids the computationally expensive application of the mapping to all elements in the corpus.
getTransformations
for available transformations.
data("crude")
## Document access triggers the stemming function
## (i.e., all other documents are not stemmed yet)
if(requireNamespace("SnowballC")) {
tm_map(crude, stemDocument, lazy = TRUE)[[1]]
}
## Use wrapper to apply character processing function
tm_map(crude, content_transformer(tolower))
## Generate a custom transformation function which takes the heading as new content
headings <- function(x)
PlainTextDocument(meta(x, "heading"),
id = meta(x, "id"),
language = meta(x, "language"))
inspect(tm_map(crude, headings))
Run the code above in your browser using DataLab