Representing and computing on corpora.
Corpora are collections of documents containing (natural language)
  text. In packages which employ the infrastructure provided by package
  tm, such corpora are represented via the virtual S3 class
  Corpus: such packages then provide S3 corpus classes extending the
  virtual base class (such as VCorpus provided by package tm
  itself).
All extension classes must provide accessors to extract subsets
  ([), individual documents ([[), and metadata
  (meta). The function length must return the number
  of documents, and as.list must construct a list holding the
  documents.
A corpus can have two types of metadata (accessible via meta).
  Corpus metadata contains corpus specific metadata in form of tag-value
  pairs. Document level metadata contains document specific metadata but
  is stored in the corpus as a data frame. Document level metadata is typically
  used for semantic reasons (e.g., classifications of documents form an own
  entity due to some high-level information like the range of possible values)
  or for performance reasons (single access instead of extracting metadata of
  each document).
The function Corpus is a convenience alias to SimpleCorpus or
  VCorpus, depending on the arguments provided.
SimpleCorpus, VCorpus, and PCorpus
  for the corpora classes provided by package tm.
DCorpus for a distributed corpus class provided by
  package tm.plugin.dc.