The Text Interchange Formats (TIF) is a set of standards
that allows R text analysis packages to target defined inputs and outputs
for corpora, tokens, and document-term matrices.
Valid data.frame of tokens
The data.frame of tokens here is a data.frame object
compatible with the TIF.
A TIF valid data.frame of tokens is expected to have
one unique key column (named doc_id)
of each text and several feature columns of each tokens.
The feature columns must contain at least token itself.