tm (version 0.6-1)

Corpus: Corpora


Representing and computing on corpora.



Corpora are collections of documents containing (natural language) text. In packages which employ the infrastructure provided by package tm, such corpora are represented via the virtual S3 class Corpus: such packages then provide S3 corpus classes extending the virtual base class (such as VCorpus provided by package tm itself).

All extension classes must provide accessors to extract subsets (DCorpus for a distributed corpus class provided by package tm.plugin.dc.