token

a character string to separate the tokens when <code>n > 1</code>

tokenSep

logical: treat text as-is (<code>FALSE</code>) or convert to all lowercase
(true); Default is <code>TRUE</code>. Note that if the <code>stemming</code> is set to 
<code>TRUE</code>, tokens will always be converted to lowercase, so this option 
will be ignored.

ignoreCase

character or string that divides one word from the next. 
You can use a regular expression as the <code>delimiter</code> value.

delimiter

a regular expression that specifies the punctuation characters 
parser will remove before it evaluates the input text.

punctuation

logical: If true, apply Porter2 Stemming to each token to reduce 
it to its root form. Default is <code>FALSE</code>.

stemming

logical or string with the name of the file that contains stop words.
If TRUE then  that should
be ignored when parsing text. Each stop word is specified on a separate line.

stopWords

a character string to separate multiple text columns.

exclude tokens shorter than minLength characters.

minLength


When <code>n=1</code> simply tokenize text and emit words with counts. When n>1
tokenized words are combined into permutations of length n within
each document.


A consistent set of tools to perform in-database analytics
on Teradata Aster Big Data Discovery Platform. toaster (a.k.a 'to Aster')
embraces simple 2-step approach: compute in Aster - visualize and analyze
in R. Its `compute` functions use combination of parallel SQL, SQL-MR and
SQL-GR executing in Aster database - highly scalable parallel
and distributed analytical platform. Then `create` functions visualize
results with boxplots, scatterplots, histograms, heatmaps, word clouds,
maps, networks, or slope graphs. Advanced options such as faceting, coloring,
labeling, and others are supported with most visualizations.

token: Tokenize (or split) text and emit n-word combinations from a document.

Description

Usage

Arguments

Value