The function dictionary() is an alias for
sbo_dictionary().
This function builds a dictionary using the most frequent words in a
training corpus. Two pruning criterions can be applied:
Dictionary size, as implemented by the max_size argument.
Target coverage fraction, as implemented by the target argument.
If both these criterions imply non-trivial cuts, the most restrictive
critierion applies.
The .preprocess argument allows the user to apply a custom
transformation to the training corpus, before word tokenization. The
EOS argument allows to specify a set of characters to be identified
as End-Of-Sentence tokens (and thus not part of words).
The returned object is a sbo_dictionary object, which is a
character vector containing words sorted by decreasing corpus frequency.
Furthermore, the object stores as attributes the original values of
.preprocess and EOS (i.e. the function used in corpus
preprocessing and the End-Of-Sentence characters for sentence tokenization).