mlvocab

mlvocab-package

<p>The following two-step abstraction is provided by the <code>mlvocab</code>
package. First, the vocabulary object is built from the entire corpus with
the help of <code><a rd-options="=vocab" href="/link/vocab()?package=mlvocab&version=0.0.1&to=%3Dvocab" data-mini-rdoc="=vocab::vocab()">vocab()</a></code>, <code><a rd-options="=vocab_update" href="/link/vocab_update()?package=mlvocab&version=0.0.1&to=%3Dvocab_update" data-mini-rdoc="=vocab_update::vocab_update()">vocab_update()</a></code> and <code><a rd-options="=vocab_prune" href="/link/vocab_prune()?package=mlvocab&version=0.0.1&to=%3Dvocab_prune" data-mini-rdoc="=vocab_prune::vocab_prune()">vocab_prune()</a></code>
functions. Second, the vocabulary is passed alongside the corpus to a
variety of corpus pre-processing functions.</p>

internal

Utilities for preprocessing of text corpora into data structures
suitable for natural language models: integer sequences or matrices,
vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices
etc. All functions allow for full or partial hashing of the terms in the
vocabulary.

Vitalie Spinu

Vocabulary and Corpus Preprocessing for Natural Language
Pipelines

mlvocab-package function

<p>The following two-step abstraction is provided by the <code>mlvocab</code>
package. First, the vocabulary object is built from the entire corpus with
the help of <code><a rd-options='=vocab' href='vocab()'>vocab()</a></code>, <code><a rd-options='=vocab_update' href='vocab_update()'>vocab_update()</a></code> and <code><a rd-options='=vocab_prune' href='vocab_prune()'>vocab_prune()</a></code>
functions. Second, the vocabulary is passed alongside the corpus to a
variety of corpus pre-processing functions.</p>

mlvocab-package: `mlvocab` package

Description

Arguments

Details

See Also