Tokenize

tokenize

ngram

ngram,character-method

<p>A set of separator characters for the "words".  See details for
information about how this works; it works a little differently
from <code>sep</code> arguments in R functions.</p>

<p>The <code>ngram()</code> function is the main workhorse of this package.  It takes
an input string and converts it into the internal n-gram representation.</p>

Tokenization

An n-gram is a sequence of n "words" taken, in order, from a
body of text.  This is a collection of utilities for creating,
displaying, summarizing, and "babbling" n-grams.  The
'tokenization' and "babbling" are handled by very efficient C
code, which can even be built as its own standalone library.
The babbler is a simple Markov chain.  The package also offers
a vignette with complete example 'workflows' and information about
the utilities offered in the package.

Tokenize: n-gram Tokenization

Description

Usage

Arguments

Value

Details

See Also

Examples