# weightTfIdf

##### Weight by Term Frequency - Inverse Document Frequency

Weight a term-document matrix by term frequency - inverse document frequency.

##### Usage

`weightTfIdf(m, normalize = TRUE)`

##### Arguments

- m
- A
`TermDocumentMatrix`

in term frequency format. - normalize
- A Boolean value indicating whether the term frequencies should be normalized.

##### Details

Formally this function is of class `WeightingFunction`

with the
additional attributes `Name`

and `Acronym`

.

*Term frequency* $\mathit{tf}_{i,j}$ counts the number of
occurrences $n_{i,j}$ of a term $t_i$ in a document
$d_j$. In the case of normalization, the term frequency
$\mathit{tf}_{i,j}$ is divided by $\sum_k n_{k,j}$.

*Inverse document frequency* for a term $t_i$ is defined as
$$\mathit{idf}_i = \log_2 \frac{|D|}{|{d \mid t_i \in d}|}$$ where
$|D|$ denotes the total number of documents and where $|{d
\mid t_i \in d}|$ is the number of documents where the term $t_i$
appears.

*Term frequency - inverse document frequency* is now defined as
$\mathit{tf}_{i,j} \cdot \mathit{idf}_i$.

##### Value

- The weighted matrix.

##### References

Gerard Salton and Christopher Buckley (1988).
Term-weighting approaches in automatic text retrieval.
*Information Processing and Management*, **24**/5, 513--523.

*Documentation reproduced from package tm, version 0.6-2, License: GPL-3*