Weight a term-document matrix by term frequency - inverse document frequency.

`weightTfIdf(m, normalize = TRUE)`

m

A `TermDocumentMatrix`

in term frequency format.

normalize

A Boolean value indicating whether the term frequencies should be normalized.

The weighted matrix.

Formally this function is of class `WeightingFunction`

with the
additional attributes `name`

and `acronym`

.

*Term frequency* \(\mathit{tf}_{i,j}\) counts the number of
occurrences \(n_{i,j}\) of a term \(t_i\) in a document
\(d_j\). In the case of normalization, the term frequency
\(\mathit{tf}_{i,j}\) is divided by \(\sum_k n_{k,j}\).

*Inverse document frequency* for a term \(t_i\) is defined as
$$\mathit{idf}_i = \log_2 \frac{|D|}{|\{d \mid t_i \in d\}|}$$ where
\(|D|\) denotes the total number of documents and where \(|\{d
\mid t_i \in d\}|\) is the number of documents where the term \(t_i\)
appears.

*Term frequency - inverse document frequency* is now defined as
\(\mathit{tf}_{i,j} \cdot \mathit{idf}_i\).

Gerard Salton and Christopher Buckley (1988).
Term-weighting approaches in automatic text retrieval.
*Information Processing and Management*, **24**/5, 513--523.