mlvocab

mlvocab-package

<p>The following two-step abstraction is provided by the <code>mlvocab</code>
package. First, the vocabulary object is built from the entire corpus with
the help of <code><a rd-options="=vocab" href="/link/vocab()?package=mlvocab&version=0.1&to=%3Dvocab" data-mini-rdoc="=vocab::vocab()">vocab()</a></code>, <code><a rd-options="=update_vocab" href="/link/update_vocab()?package=mlvocab&version=0.1&to=%3Dupdate_vocab" data-mini-rdoc="=update_vocab::update_vocab()">update_vocab()</a></code> and <code><a rd-options="=prune_vocab" href="/link/prune_vocab()?package=mlvocab&version=0.1&to=%3Dprune_vocab" data-mini-rdoc="=prune_vocab::prune_vocab()">prune_vocab()</a></code>
functions. Second, the vocabulary is passed alongside the corpus to a
variety of corpus pre-processing functions.</p>

Utilities for preprocessing of text corpora into data structures
suitable for natural language models: integer sequences or matrices,
vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices
etc. All functions allow for full or partial hashing of the terms in the
vocabulary.

Vitalie Spinu

Vocabulary and Corpus Preprocessing for Natural Language
Pipelines

mlvocab-package function

<p>The following two-step abstraction is provided by the <code>mlvocab</code>
package. First, the vocabulary object is built from the entire corpus with
the help of <code><a rd-options='=vocab' href='vocab()'>vocab()</a></code>, <code><a rd-options='=update_vocab' href='update_vocab()'>update_vocab()</a></code> and <code><a rd-options='=prune_vocab' href='prune_vocab()'>prune_vocab()</a></code>
functions. Second, the vocabulary is passed alongside the corpus to a
variety of corpus pre-processing functions.</p>

mlvocab-package: `mlvocab` package

Description

Arguments

Details

See Also