Free Access Week - Data Engineering + BI
Data Engineering and BI courses are free this week!
Free Access Week - Jun 2-8

text2vec (version 0.2.0)

get_tcm: Creates Term-Coocurnce matrix construction

Description

Creates Term-Coocurnce matrix from Corpus object.

Usage

get_tcm(corpus)

Arguments

corpus
HashCorpus or VocabCorpus object. See create_vocab_corpus, create_hash_corpus for details.

See Also

create_vocab_corpus, create_hash_corpus

Examples

Run this code
txt <- movie_review[['review']][1:1000]
it <- itoken(txt, tolower, word_tokenizer)
vocab <- vocabulary(it)
#remove very common and uncommon words
pruned_vocab = prune_vocabulary(vocab, term_count_min = 10,
 doc_proportion_max = 0.8, doc_proportion_min = 0.001, max_number_of_terms = 5000)

it <- itoken(txt, tolower, word_tokenizer)
corpus <- create_vocab_corpus(it, pruned_vocab, grow_dtm = FALSE, skip_grams_window = 5)
tcm <- get_tcm(corpus)
dim(tcm)

Run the code above in your browser using DataLab