Learn R Programming

text2vec (version 0.2.0)

dtm_get_idf: Inverse Document-Frequency scaling matrix construction

Description

Creates Inverse Document-Frequency (idf) scaling matrix from Document-Term matrix. For examples see get_dtm. idf = log (# documents in the corpus) / (# documents where the term appears + 1) For examples see get_dtm

Usage

dtm_get_idf(dtm, log_scale = log, smooth_idf = T)

Arguments

dtm
dgCMatrix - Document-Term matrix.
log_scale
function to use in idf calculation. Usually log used. Also worth to try log2.
smooth_idf
logical smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.

Value

  • ddiMatrix idf scaling diagonal sparse matrix.

See Also

dtm_get_tf, get_dtm