Last chance! 50% off unlimited learning
Sale ends in
Weight a dfm by term frequency-inverse document frequency (tf-idf) using fully sparse methods.
tfidf(x, scheme_tf = "prop", scheme_df = "inverse", base = 10, ...)
object for which idf or tf-idf will be computed (a document-feature matrix)
scheme for tf
; defaults to "count"
scheme for link{docfreq}
; defaults to
"inverse"
for the logarithms in the tf
and docfreq
calls
additional arguments passed to docfreq
when calling
tfidf
tfidf
computes term frequency-inverse document frequency
weighting. The default is not to normalize term frequency (by computing
relative term frequency within document) but this will be performed if
scheme_tf = "prop"
.
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
# NOT RUN {
head(data_dfm_lbgexample[, 5:10])
head(tfidf(data_dfm_lbgexample)[, 5:10])
docfreq(data_dfm_lbgexample)[5:15]
head(tf(data_dfm_lbgexample)[, 5:10])
# replication of worked example from
# https://en.wikipedia.org/wiki/Tf-idf#Example_of_tf.E2.80.93idf
(wikiDfm <- new("dfmSparse",
Matrix::Matrix(c(1,1,2,1,0,0, 1,1,0,0,2,3),
byrow = TRUE, nrow = 2,
dimnames = list(docs = c("document1", "document2"),
features = c("this", "is", "a", "sample", "another",
"example")), sparse = TRUE)))
docfreq(wikiDfm)
tfidf(wikiDfm)
# }
Run the code above in your browser using DataLab