get_idf
From text2vec v0.3.0
by Dmitriy Selivanov
Inverse document-frequency scaling matrix
This function creates an inverse-document-frequency (IDF)
scaling matrix from a document-term matrix. The IDF is defined as follows:
idf = log(# documents in the corpus) / (# documents where the term
appears + 1)
Usage
get_idf(dtm, log_scale = log, smooth_idf = TRUE)
Arguments
- dtm
- a document-term matrix of class
dgCMatrix
ordgTMatrix
. - log_scale
function
to use in calculating the IDF matrix. Usually log is used, but it might be worth trying log2.- smooth_idf
logical
smooth IDF weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. This prevents division by zero.
Value
ddiMatrix
IDF scaling diagonal sparse matrix.
See Also
Community examples
Looks like there are no examples yet.