nGram

logical: if FALSE, the n-gram matching is case sensitive and
if TRUE, case is ignored during matching.

ignoreCase

character or string that divides one word from the next. 
You can use a regular expression as the <code>delimiter</code> value.

delimiter

a regular expression that specifies the punctuation characters 
parser will remove before it evaluates the input text.

punctuation

logical: true value allows for overlapping n-grams.

overlapping

a regular expression listing one or more punctuation characters or 
strings, any of which the <code>nGram</code> parser will recognize as the end of a sentence 
of text. The end of each sentence resets the search for n-grams, meaning that 
<code>nGram</code> discards any partial n-grams and proceeds to the next sentence to search 
for the next n-gram. In other words, no n-gram can span two sentences.

reset

a character string to separate multiple text columns.

minimum length of words in ngram. Ngrams that contains words below 
shorter than the limit are omitted. Current implementation is not complete: it
filters out ngrams where each word is below the minimum length, i.e. total length of 
ngram is below n*minLength + (n-1).

minLength


Tokenize (or split) text and emit multi-grams.


A consistent set of tools to perform in-database analytics
on Teradata Aster Big Data Discovery Platform. toaster (a.k.a 'to Aster')
embraces simple 2-step approach: compute in Aster - visualize and analyze
in R. Its `compute` functions use combination of parallel SQL, SQL-MR and
SQL-GR executing in Aster database - highly scalable parallel
and distributed analytical platform. Then `create` functions visualize
results with boxplots, scatterplots, histograms, heatmaps, word clouds,
maps, networks, or slope graphs. Advanced options such as faceting, coloring,
labeling, and others are supported with most visualizations.

nGram: Tokenize (or split) text and emit multi-grams.

Description

Usage

Arguments

Value