Join us for
RADAR: AI Edition

tm (version 0.7-5)

weightTfIdf: Weight by Term Frequency - Inverse Document Frequency

Description

Weight a term-document matrix by term frequency - inverse document frequency.

Usage

weightTfIdf(m, normalize = TRUE)

Arguments

m

A TermDocumentMatrix in term frequency format.

normalize

A Boolean value indicating whether the term frequencies should be normalized.

Value

The weighted matrix.

Details

Formally this function is of class WeightingFunction with the additional attributes name and acronym.

Term frequency tfi,j counts the number of occurrences ni,j of a term ti in a document dj. In the case of normalization, the term frequency tfi,j is divided by knk,j.

Inverse document frequency for a term ti is defined as idfi=log2|D||{dtid}| where |D| denotes the total number of documents and where |{dtid}| is the number of documents where the term ti appears.

Term frequency - inverse document frequency is now defined as tfi,jidfi.

References

Gerard Salton and Christopher Buckley (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24/5, 513--523.