create_corpus

iterator over a <code>list</code> of <code>character</code> vectors. Each
element is a list of tokens, that is, tokenized and normalized strings.

iterator

<code>function</code> vectorizer function. See
<a rd-options="" href="/link/vectorizers?package=text2vec&version=0.3.0" data-mini-rdoc="text2vec::vectorizers">vectorizers</a>.

vectorizer


This functions creates corpus objects (based on vocabulary or
  hashes), which are stored outside of R's heap and wrapped via reference
  classes using Rcpp-Modules. From those objects you can easily extract
  document-term (DTM) and term-co-occurrence (TCM) matrices. Also, text2vec
  grows the corpus for DTM and TCM matrices simultaneously in a RAM-friendly
  and efficient way using the iterators abstraction. You can build corpora
  from objects or files which are orders of magnitude larger that available
  RAM.


Very fast and memory-friendly tools for text vectorization and
state-of-the-art word embeddings (GloVe). This package provides a
source-agnostic streaming API, which allows researchers to perform analysis
of collections of documents which are much larger than available RAM. All
core functions are parallelized to benefit from multicore machines.

create_corpus: Create a corpus

Description

Usage

Arguments

Value

See Also