This set of functions scales a document-term matrix.

`transform_tf`

: scale a DTM by one of two methods. If ```
norm =
"l1"
```

, then then ```
dtm_tf = (count of a particular word in the document)
/ (total number of words in the document)
```

. If `norm = "l2"`

, then
```
dtm_tf = (count of a particular word in the document) ^ 2 / (total
number words in the document) ^ 2
```

.

`transform_binary`

: scale a DTM so that if a cell is 1 if a word appears
in the document; otherwise it is 0.

`transform_tfidf`

: scale a DTM so that ```
dtm_idf = log(count of a
particular word in a document) / (number of documents where the term appears
+ 1)
```

`transform_tf(dtm, sublinear_tf = FALSE, norm = c("l1", "l2", "none"))`transform_tfidf(dtm, idf = NULL, sublinear_tf = FALSE, norm = c("l1",
"l2"))

transform_binary(dtm)

dtm

a document-term matrix of class `dgCMatrix`

or
`dgTMatrix`

.

sublinear_tf

`logical`

, `FALSE`

by default. Apply sublinear
term-frequency scaling, i.e., replace the term frequency with ```
1 +
log(TF)
```

.

norm

`character`

Type of normalization to apply to term vectors.
`"l1"`

by default, i.e., scale by the number of words in the document.

idf

`ddiMatrix`

a diagonal matrix for IDF scaling. See
get_idf. If not provided the IDF scaling matrix will be calculated
from the matrix passed to `dtm`

.

`transform_tfidf`

: Scale a document-term matrix via TF-IDF`transform_binary`

: Transform a document-term matrix into binary representation

# NOT RUN { data(moview_review) txt = movie_review[["review"]][1:1000] it = itoken(txt, tolower, word_tokenizer) vocab = vocabulary(it) #remove very common and uncommon words pruned_vocab = prune_vocabulary(vocab, term_count_min = 10, doc_proportion_max = 0.8, doc_proportion_min = 0.001, max_number_of_terms = 20000) it = itoken(txt, tolower, word_tokenizer) dtm = create_dtm(it, pruned_vocab) dtm_filtered = dtm %>% # functionality overlaps with prune_vocabulary(), # but still can be useful in some cases # filter out very common and very uncommon terms transform_filter_commons( c(0.001, 0.975) ) # simple term-frequency transormation transformed_tf = dtm %>% transform_tf # tf-idf transormation idf = get_idf(dtm) transformed_tfidf = transform_tfidf(dtm, idf) # }