Last chance! 50% off unlimited learning
Sale ends in
For a dfm object, returns a (weighted) document frequency for each term. The default is a simple count of the number of documents in which a feature occurs more than a given frequency threshold. (The default threshold is zero, meaning that any feature occurring at least once in a document will be counted.)
docfreq(x, scheme = c("count", "inverse", "inversemax", "inverseprob",
"unary"), smoothing = 0, k = 0, base = 10, threshold = 0,
use.names = TRUE)
a dfm
type of document frequency weighting, computed as
follows, where
count
inverse
inversemax
inverseprob
unary
1 for each feature
added to the quotient before taking the logarithm
added to the denominator in the "inverse" weighting types, to prevent a zero document count for a term
the base with respect to which logarithms in the inverse document frequency weightings are computed; default is 10 (see Manning, Raghavan, and Sch<U+00FC>tze 2008, p123).
numeric value of the threshold above which a feature will considered in the computation of document frequency. The default is 0, meaning that a feature's document frequency will be the number of documents in which it occurs greater than zero times.
logical; if TRUE
attach feature labels as names of
the resulting numeric vector
not used
a numeric vector of document frequencies for each feature
Manning, C. D., Raghavan, P., & Sch<U+00FC>tze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
# NOT RUN {
mydfm <- dfm(data_corpus_inaugural[1:2])
docfreq(mydfm[, 1:20])
# replication of worked example from
# https://en.wikipedia.org/wiki/Tf-idf#Example_of_tf.E2.80.93idf
wiki_dfm <-
matrix(c(1,1,2,1,0,0, 1,1,0,0,2,3),
byrow = TRUE, nrow = 2,
dimnames = list(docs = c("document1", "document2"),
features = c("this", "is", "a", "sample",
"another", "example"))) %>%
as.dfm()
wiki_dfm
docfreq(wiki_dfm)
docfreq(wiki_dfm, scheme = "inverse")
docfreq(wiki_dfm, scheme = "inverse", k = 1, smoothing = 1)
docfreq(wiki_dfm, scheme = "unary")
docfreq(wiki_dfm, scheme = "inversemax")
docfreq(wiki_dfm, scheme = "inverseprob")
# }
Run the code above in your browser using DataLab