dfm_weight(x, type = c("frequency", "relFreq", "relMaxFreq", "logFreq",
"tfidf"), weights = NULL)dfm_smooth(x, smoothing = 1)
"frequency"
"relFreq"
"relMaxFreq"
"logFreq"
"tfidf"
tfidf
directly.type
is unused, then weights
can be
a named numeric vector of weights to be applied to the dfm,
where the names of the vector correspond to feature labels of the dfm, and
the weights will be applied as multipliers to the existing feature counts
for the corresponding named fatures. Any features not named will be
assigned a weight of 1.0 (meaning they will be unchanged).tf
, tfidf
, docfreq
dtm <- dfm(data_corpus_inaugural)
x <- apply(dtm, 1, function(tf) tf/max(tf))
topfeatures(dtm)
normDtm <- dfm_weight(dtm, "relFreq")
topfeatures(normDtm)
maxTfDtm <- dfm_weight(dtm, type = "relMaxFreq")
topfeatures(maxTfDtm)
logTfDtm <- dfm_weight(dtm, type = "logFreq")
topfeatures(logTfDtm)
tfidfDtm <- dfm_weight(dtm, type = "tfidf")
topfeatures(tfidfDtm)
# combine these methods for more complex dfm_weightings, e.g. as in Section 6.4
# of Introduction to Information Retrieval
head(logTfDtm <- dfm_weight(dtm, type = "logFreq"))
head(tfidf(logTfDtm, normalize = FALSE))
#' # apply numeric weights
str <- c("apple is better than banana", "banana banana apple much better")
(mydfm <- dfm(str, remove = stopwords("english")))
dfm_weight(mydfm, weights = c(apple = 5, banana = 3, much = 0.5))
# smooth the dfm
dfm_smooth(mydfm, 0.5)
Run the code above in your browser using DataLab