Apply varieties of term frequency weightings to a dfm.
tf(x, scheme = c("count", "prop", "propmax", "boolean", "log", "augmented",
"logave"), base = 10, K = 0.5)object for which idf or tf-idf will be computed (a document-feature matrix)
divisor for the normalization of feature frequencies by document. Valid types include:
countdefault, each feature count will remain as feature counts, equivalent to dividing by 1
propfeature proportions within document, equivalent to dividing each term by the total count of features in the document.
propmaxfeature proportions relative to the most frequent term of the document, equivalent to dividing term counts by the frequency of the most frequent term in the document.
booleanrecode all non-zero counts as 1
logtake the logarithm of 1 + each
count, for base base
augmentedequivalent to K + (1 - K) * tf(x, "propmax")
logave(1 + the log of the counts) / (1 + log of the counts / the average count within document)
base for the logarithm when scheme is "log" or
logave
the K for the augmentation when scheme = "augmented"
A document feature matrix to which the weighting scheme has been applied.
tf(x, scheme = "prop") is equivalent to weight(x, "relFreq")).
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.