splitter

string

Logical; should a split occur after every character?

split.char

Logical; determines if spaces should be preserved as characters in
the n-gram tokenization. The character(s) used for spaces are
determined by the <code>spacesep</code> argument.
characters.

split.space

The character(s) to represent a space in the case that
<code>split.space=TRUE</code>. Should not just be a space(s).

spacesep

Logical; determines if splits should occur at punctuation.

split.punct

A utility function for use with n-gram modeling. This function
splits a string based on various options.

preprocessing

An n-gram is a sequence of n "words" taken, in order, from a
body of text.  This is a collection of utilities for creating,
displaying, summarizing, and "babbling" n-grams.  The
'tokenization' and "babbling" are handled by very efficient C
code, which can even be built as its own standalone library.
The babbler is a simple Markov chain.  The package also offers
a vignette with complete example 'workflows' and information about
the utilities offered in the package.

splitter: Character Splitter

Description

Usage

Arguments

Value

Details

Examples