ngram-class

An n-gram is an ordered sequence of n "words" taken from a body of "text".
The terms "words" and "text" can easily be interpreted literally, or with a
more loose interpretation.

Tokenization

An n-gram is a sequence of n "words" taken, in order, from a
body of text.  This is a collection of utilities for creating,
displaying, summarizing, and "babbling" n-grams.  The
'tokenization' and "babbling" are handled by very efficient C
code, which can even be built as its own standalone library.
The babbler is a simple Markov chain.  The package also offers
a vignette with complete example 'workflows' and information about
the utilities offered in the package.

Drew Schmidt

ngram

Fast n-Gram 'Tokenization'

Christian Heckendorf

ngram-class function

<dl class="dl-horizontal">
<dt><code>str_ptr</code></dt><dd>A pointer to a copy of the original input string.</dd></dl><dt><code>strlen</code></dt><dd>The length of the string.</dd><dt><code>n</code></dt><dd>The eponymous 'n' as in 'n-gram'.</dd><dt><code>ngl_ptr</code></dt><dd>A pointer to the processed list of n-grams.</dd><dt><code>ngsize</code></dt><dd>The length of the ngram list, or in other words, the number of
unique n-grams in the input string.</dd><dt><code>sl_ptr</code></dt><dd>A pointer to the list of words from the input string.</dd>

ngram-class: Class ngram

Description

Arguments

Slots

Details

See Also