powered by
Split documents in a corpus into documents of one of more paragraphs.
split_documents(corpus, chunksize, preserveMetadata = TRUE)
A Corpus object.
Corpus
The number of paragraphs each new document should contain at most.
Whether to preserve the meta-data of original documents.
A Corpus object with split documents.
# NOT RUN { file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva") corpus <- import_corpus(file, "factiva", language="en") split_documents(corpus, 3) # }
Run the code above in your browser using DataLab