prune_vocabulary

0th

Percentile

Prune vocabulary

This function filters the input vocabulary and throws out very frequent and very infrequent terms. See examples in for the vocabulary function. The parameter vocab_term_max can also be used to limit the absolute size of the vocabulary to only the most frequently used terms.

Usage
prune_vocabulary(vocabulary, term_count_min = 1L, term_count_max = Inf,
  doc_proportion_min = 0, doc_proportion_max = 1, doc_count_min = 1L,
  doc_count_max = Inf, vocab_term_max = Inf)
Arguments
vocabulary

a vocabulary from the vocabulary function.

term_count_min

minimum number of occurences over all documents.

term_count_max

maximum number of occurences over all documents.

doc_proportion_min

minimum proportion of documents which should contain term.

doc_proportion_max

maximum proportion of documents which should contain term.

doc_count_min

term will be kept number of documents contain this term is larger than this value

doc_count_max

term will be kept number of documents contain this term is smaller than this value

vocab_term_max

maximum number of terms in vocabulary.

See Also

vocabulary

Aliases
  • prune_vocabulary
Documentation reproduced from package text2vec, version 0.6, License: GPL (>= 2) | file LICENSE

Community examples

Looks like there are no examples yet.