Learn R Programming

⚠️There's a newer version (0.6.4) of this package.Take me there.

text2vec (version 0.2.0)

Fast and Modern Text Mining Framework - Vectorization and Word Embeddings

Description

Very fast and memory-friendly tools for text vectorization and learning word embeddings (GloVe). Also package provides source-agnostic streaming API, which allows to perform analysis of collections of documents, which are much larger the available RAM.

Copy Link

Version

Install

install.packages('text2vec')

Monthly Downloads

8,738

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Dmitriy Selivanov

Last Published

January 10th, 2016

Functions in text2vec (0.2.0)

prune_vocabulary

Prunes vocabulary.
GloveFitter

Rcpp module: GloveFitter Exposes C++ functions to fit GloVe model
feature_hasher

Creates meta information about feature hashing
movie_review

5000 IMDB movie reviews.
create_vocab_corpus

RAM-friendly streaming corpus construction.
tokenizers

Tokenization functions, which performs string splitting
get_dtm

Creates Document-Term matrix
to_lda_c

Converts 'dgCMatrix' to 'lda_c' format
glove

Perform fit of the GloVe model.
split_vector

Generating indexes for splitting vector into chunks
tf_transformer

Scales Document-Term matrix
HashCorpus

Rcpp module: HashCorpus Exposes C++ functions to construct hashed Document-Term Matrix
ilines

Creates iterator over lines of connection/file
reexports

Objects exported from other packages
get_tcm

Creates Term-Coocurnce matrix construction
check_analogue_accuracy

Checks accuracy of word embeddings on analogue task.
ifiles

Creates iterator over text/serialized files from the disk
filter_commons_transformer

remove (un)common terms from Document-Term matrix
vocabulary

Creates vocabulary (unique terms)
dtm_get_idf

Inverse Document-Frequency scaling matrix construction
prepare_analogue_questions

Prepares questions list from questions-words.txt format.
text2vec

The text2vec package.
dtm_get_tf

TermFrequency scaling matrix construction from Document-Term-Matrix
itoken

Creates iterator over input object.
VocabCorpus

Rcpp module: VocabCorpus Exposes C++ functions to construct Document-Term Matrix
VocabularyBuilder

Rcpp module: VocabularyBuilder Exposes C++ functions to construct Vocabulary