Tutorials

To learn how to use this package, see the package vignettes.

Text vectorization: vignette("text-vectorization", package = "text2vec")
GloVe word embeddings: vignette("glove", package = "text2vec")

See also the text2vec articles on my blog.

Features

text2vec is a package that provides an efficient framework with a concise API for text analysis and natural language processing (NLP) in R. It is inspired by gensim, an excellent Python library for NLP.

The core functionality at the moment includes

Fast text vectorization on arbitrary n-grams, using vocabulary or feature hashing.
State-of-the-art GloVe word embeddings.

The core of this package is carefully written in C++, which means text2vec is fast and memory friendly. Some parts (GloVe training) are fully parallelized using the excellent RcppParallel package. This means that parallel processing works on OS X, Linux, Windows and Solaris (x86) without any additional hacking or tricks. In addition, there is a higher-level parallelization for text vectorization and vocabulary construction on top of the foreach package, and text2vec has a streaming API so that users don't have to load all of the data into RAM.

The API is built around the iterator abstraction. The API is concise, providing only a few functions which do their job well. The package does not (and probably will not in the future) provide trivial very high-level functions. But other packages can build on top of the framework that text2vec provides.

Contributing

The package has issue tracker on GitHub where I'm filing feature requests and notes for future work. Any ideas are appreciated.

Contributors are welcome. You can help by

testing and leaving feedback on the GitHub issuer tracker (preferably) or directly by e-mail.
forking and contributing. Vignettes, docs, tests, and use cases are very welcome.
by giving me a star on project page :-)

Functions in text2vec (0.3.0)

Tutorials

Features

Contributing

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in text2vec (0.3.0)