Learn R Programming

textir (version 2.0-4)

tfidf: tf-idf

Description

term frequency, inverse document frequency

Usage

tfidf(x,normalize=TRUE)

Arguments

x
A dgCMatrix or matrix of counts.
normalize
Whether to normalize term frequency by document totals.

Value

  • A matrix of the same type as x, with values replaced by the tf-idf $$f_{ij} * \log[n/(d_j+1)],$$ where $f_{ij}$ is $x_{ij}/m_i$ or $x_{ij}$, depending on normalize, and $d_j$ is the number of documents containing token $j$.

See Also

pls, we8there

Examples

Run this code
data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
	order(-sdev(tfidf(we8thereCounts)))[1:20]]

Run the code above in your browser using DataLab