This function removes very common and very uncommon words from a document-term matrix.
transform_filter_commons(dtm, term_freq = c(uncommon = 0.001, common = 0.975))
a document-term matrix of class dgCMatrix
or
dgTMatrix
.
numeric
vector of 2 values in between 0
and
1
. The first element corresponds to frequency of uncommon words; the
second element corresponds to the frequency of common words. Terms which
are observed less than first value or frequency or more than second will be
filtered out.
prune_vocabulary, transform_tf, transform_tfidf, transform_binary