# transform_tf

0th

Percentile

##### Scale a document-term matrix

This set of functions scales a document-term matrix.

transform_tf: scale a DTM by one of two methods. If norm = "l1", then then dtm_tf = (count of a particular word in the document) / (total number of words in the document). If norm = "l2", then dtm_tf = (count of a particular word in the document) ^ 2 / (total number words in the document) ^ 2.

transform_binary: scale a DTM so that if a cell is 1 if a word appears in the document; otherwise it is 0.

transform_tfidf: scale a DTM so that dtm_idf = log(count of a particular word in a document) / (number of documents where the term appears + 1)

##### Usage
transform_tf(dtm, sublinear_tf = FALSE, norm = c("l1", "l2", "none"))transform_tfidf(dtm, idf = NULL, sublinear_tf = FALSE, norm = c("l1",
"l2"))transform_binary(dtm)
##### Arguments
dtm

a document-term matrix of class dgCMatrix or dgTMatrix.

sublinear_tf

logical, FALSE by default. Apply sublinear term-frequency scaling, i.e., replace the term frequency with 1 + log(TF).

norm

character Type of normalization to apply to term vectors. "l1" by default, i.e., scale by the number of words in the document.

idf

ddiMatrix a diagonal matrix for IDF scaling. See get_idf. If not provided the IDF scaling matrix will be calculated from the matrix passed to dtm.

##### Functions

• transform_tfidf: Scale a document-term matrix via TF-IDF

• transform_binary: Transform a document-term matrix into binary representation

##### Aliases
• transform_binary
• transform_tf
• transform_tfidf
##### Examples
# NOT RUN {
data(moview_review)

txt = movie_review[["review"]][1:1000]
it = itoken(txt, tolower, word_tokenizer)
vocab = vocabulary(it)
#remove very common and uncommon words
pruned_vocab = prune_vocabulary(vocab,
term_count_min = 10,
doc_proportion_max = 0.8, doc_proportion_min = 0.001,
max_number_of_terms = 20000)

it = itoken(txt, tolower, word_tokenizer)
dtm = create_dtm(it, pruned_vocab)

dtm_filtered = dtm %>%
# functionality overlaps with prune_vocabulary(),
# but still can be useful in some cases
# filter out very common and very uncommon terms
transform_filter_commons( c(0.001, 0.975) )

# simple term-frequency transormation
transformed_tf = dtm %>%
transform_tf

# tf-idf transormation
idf = get_idf(dtm)
transformed_tfidf = transform_tfidf(dtm,  idf)
# }

Documentation reproduced from package text2vec, version 0.4.0, License: GPL (>= 2) | file LICENSE

### Community examples

Looks like there are no examples yet.