tfidf

0th

Percentile

TF-IDF

This function calculates a variant of TF-IDF. The input is assumed to contain relative frequencies. IDF is calculated as follows: \(idf_t = \log\frac{N+1}{n_t}\), with \(N\) being the total number of documents (i.e., rows) and \(n_t\) the number of documents containing term \(t\). We add one to the denominator to prevent terms that appear in every document to become 0.

Usage
tfidf(ftable)
Arguments
ftable

A matrix, containing "documents" as rows and "terms" as columns. Values are assumed to be normalized by document, i.e., contain relative frequencies.

Value

A matrix containing TF*IDF values instead of relative frequencies.

Aliases
  • tfidf
Examples
# NOT RUN {
data(rksp.0)
ftable <- frequencytable(rksp.0, byCharacter=TRUE, normalize=TRUE)
rksp.0.tfidf <- tfidf(ftable)
mat <- matrix(c(0.10,0.2, 0,
                0,   0.2, 0,
                0.1, 0.2, 0.1,
                0.8, 0.4, 0.9),
              nrow=3,ncol=4)
mat2 <- tfidf(mat)
print(mat2)
# }
Documentation reproduced from package DramaAnalysis, version 3.0.0, License: GPL (>= 3)

Community examples

Looks like there are no examples yet.