mlvocab (version 0.1)

tfidf: Tfidf re-weighting of dtm and tdm matrices

Description

Tfidf re-weighting of dtm and tdm matrices

Usage

tfidf(mat, vocab, norm = c("l1", "l2", "none"), sublinear_tf = FALSE,
  extra_df_count = 1)

Arguments

mat

output of dtm() or tdm() function

vocab

output of vocab() or update_vocab()

norm

normalization to apply for each document. Either "l1", "l2" or "none"

sublinear_tf

when TRUE use 1 + log(tf) instead of the raw tf

extra_df_count

add this number to the document count; as if all terms in the vocabulary have been seen at least in this many documents.

Examples

Run this code
# NOT RUN {
corpus <- list(a = c("The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"),
               b = c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog",
                     "the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"))
v <- vocab(corpus, c(1, 2), " ")
dtm <- dtm(corpus, v)
tfidf(dtm, v)
tdm <- tdm(corpus, v)
tfidf(tdm, v)
# }

Run the code above in your browser using DataLab