prune_vocabulary

0th

Percentile

Prune vocabulary

This function filters the input vocabulary and throws out very frequent and very infrequent terms. See examples in for the vocabulary function. The parameter max_number_of_terms can also be used to limit the absolute size of the vocabulary to only the most frequently used terms.

Usage
prune_vocabulary(vocabulary, term_count_min = 1L, term_count_max = Inf, doc_proportion_min = 0, doc_proportion_max = 1, max_number_of_terms = Inf)
Arguments
vocabulary
a vocabulary from the vocabulary function.
term_count_min
minimum number of occurences over all documents.
term_count_max
maximum number of occurences over all documents.
doc_proportion_min
minimum proportion of documents which should contain term.
doc_proportion_max
maximum proportion of documents which should contain term.
max_number_of_terms
maximum number of terms in vocabulary.