Learn R Programming

quanteda (version 0.9.4)

weight: weight the feature frequencies in a dfm by various methods

Description

Returns a document by feature matrix with the feature frequencies weighted according to one of several common methods.

Usage

weight(x, ...)

## S3 method for class 'dfm': weight(x, type = c("frequency", "relFreq", "relMaxFreq", "logFreq", "tfidf"), ...)

smoother(x, smoothing = 1)

weighting(object)

## S3 method for class 'dfm': weighting(object)

Arguments

x
document-feature matrix created by dfm
...
not currently used. For finer grained control, consider calling tf or tfidf directly.
type
The weighting function to aapply to the dfm. One of: [object Object],[object Object],[object Object],[object Object],[object Object]
smoothing
constant added to the dfm cells for smoothing, default is 1
object
the dfm object for accessing the weighting setting

Value

  • The dfm with weighted values

    weighting returns a character object describing the type of weighting applied to the dfm.

Details

This converts a matrix from sparse to dense format, so may exceed memory requirements depending on the size of your input matrix.

weighting queries (but cannot set) the weighting applied to the dfm.

References

Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schutze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge university press, 2008.

See Also

tfidf

Examples

Run this code
dtm <- dfm(inaugCorpus)
x <- apply(dtm, 1, function(tf) tf/max(tf))
topfeatures(dtm)
normDtm <- weight(dtm, "relFreq")
topfeatures(normDtm)
maxTfDtm <- weight(dtm, type="relMaxFreq")
topfeatures(maxTfDtm)
logTfDtm <- weight(dtm, type="logFreq")
topfeatures(logTfDtm)
tfidfDtm <- weight(dtm, type="tfidf")
topfeatures(tfidfDtm)

# combine these methods for more complex weightings, e.g. as in Section 6.4 of
# Introduction to Information Retrieval
head(logTfDtm <- weight(dtm, type="logFreq"))
head(tfidf(logTfDtm, normalize = FALSE))

testdfm <- dfm(inaugTexts[1:5], verbose = FALSE)
for (w in c("frequency", "relFreq", "relMaxFreq", "logFreq", "tfidf")) {
    testw <- weight(testdfm, w)
    cat("\n\n=== weight() TEST for:", w, "; class:", class(testw), "\n")
    head(testw)
}

Run the code above in your browser using DataLab