create_dtm

create_dtm.itoken

create_dtm.list

<a rd-options="" href="/link/itoken?package=text2vec&version=0.4.0" data-mini-rdoc="text2vec::itoken">itoken</a> iterator or <code>list</code> of <code>itoken</code> iterators.

<code>function</code> vectorizer function; see
<a rd-options="" href="/link/vectorizers?package=text2vec&version=0.4.0" data-mini-rdoc="text2vec::vectorizers">vectorizers</a>.

vectorizer

<code>character</code>, one of <code>c("dgCMatrix", "dgTMatrix",
"lda_c")</code>. <code>"lda_c"</code> is Blei's lda-c format (a list of 2 *
doc_terms_size); see
<a href="https://www.cs.princeton.edu/~blei/lda-c/readme.txt">https://www.cs.princeton.edu/~blei/lda-c/readme.txt</a>

type

arguments to the <a rd-options="" href="/link/foreach?package=text2vec&version=0.4.0" data-mini-rdoc="text2vec::foreach">foreach</a> function which is used to iterate
over <code>it</code>.

<code>logical</code> print status messages

verbose

This is a high-level function for creating a document-term
 matrix.

Fast and memory-friendly tools for text vectorization,
topic modeling (LDA, LSA), word embeddings (GloVe), similarities.
This package provides a source-agnostic streaming API, which allows researchers
to perform analysis of collections of documents which are larger than available RAM.
All core functions are parallelized to benefit from multicore machines.

Dmitriy Selivanov

text2vec

Modern Text Mining Framework for R

Lincoln Mullen

create_dtm function

<a rd-options='' href='itoken'>itoken</a> iterator or <code>list</code> of <code>itoken</code> iterators.

<code>function</code> vectorizer function; see
<a rd-options='' href='vectorizers'>vectorizers</a>.

<code>character</code>, one of <code>c("dgCMatrix", "dgTMatrix",
"lda_c")</code>. <code>"lda_c"</code> is Blei's lda-c format (a list of 2 *
doc_terms_size); see
<a href='https://www.cs.princeton.edu/~blei/lda-c/readme.txt'>https://www.cs.princeton.edu/~blei/lda-c/readme.txt</a>

arguments to the <a rd-options='' href='foreach'>foreach</a> function which is used to iterate
over <code>it</code>.

create_dtm: Document-term matrix construction

Description

Usage

Arguments

Value

Details

See Also

Examples