# transform_tf

##### Scale a document-term matrix

This set of functions scales a document-term matrix.

`transform_tf`

: scale a DTM by one of two methods. If ```
norm =
"l1"
```

, then then ```
dtm_tf = (count of a particular word in the document)
/ (total number of words in the document)
```

. If `norm = "l2"`

, then
```
dtm_tf = (count of a particular word in the document) ^ 2 / (total
number words in the document) ^ 2
```

.

`transform_binary`

: scale a DTM so that if a cell is 1 if a word appears
in the document; otherwise it is 0.

`transform_tfidf`

: scale a DTM so that ```
dtm_idf = log(count of a
particular word in a document) / (number of documents where the term appears
+ 1)
```

##### Usage

`transform_tf(dtm, sublinear_tf = FALSE, norm = c("l1", "l2", "none"))`transform_tfidf(dtm, idf = NULL, sublinear_tf = FALSE, norm = c("l1",
"l2"))

transform_binary(dtm)

##### Arguments

- dtm
a document-term matrix of class

`dgCMatrix`

or`dgTMatrix`

.- sublinear_tf
`logical`

,`FALSE`

by default. Apply sublinear term-frequency scaling, i.e., replace the term frequency with`1 + log(TF)`

.- norm
`character`

Type of normalization to apply to term vectors.`"l1"`

by default, i.e., scale by the number of words in the document.- idf
`ddiMatrix`

a diagonal matrix for IDF scaling. See get_idf. If not provided the IDF scaling matrix will be calculated from the matrix passed to`dtm`

.

##### Functions

`transform_tfidf`

: Scale a document-term matrix via TF-IDF`transform_binary`

: Transform a document-term matrix into binary representation

##### See Also

##### Examples

```
# NOT RUN {
data(moview_review)
txt = movie_review[["review"]][1:1000]
it = itoken(txt, tolower, word_tokenizer)
vocab = vocabulary(it)
#remove very common and uncommon words
pruned_vocab = prune_vocabulary(vocab,
term_count_min = 10,
doc_proportion_max = 0.8, doc_proportion_min = 0.001,
max_number_of_terms = 20000)
it = itoken(txt, tolower, word_tokenizer)
dtm = create_dtm(it, pruned_vocab)
dtm_filtered = dtm %>%
# functionality overlaps with prune_vocabulary(),
# but still can be useful in some cases
# filter out very common and very uncommon terms
transform_filter_commons( c(0.001, 0.975) )
# simple term-frequency transormation
transformed_tf = dtm %>%
transform_tf
# tf-idf transormation
idf = get_idf(dtm)
transformed_tfidf = transform_tfidf(dtm, idf)
# }
```

*Documentation reproduced from package text2vec, version 0.4.0, License: GPL (>= 2) | file LICENSE*