Usage
nGram(n, ignoreCase = FALSE, delimiter = "[ \\t\\b\\f\\r]+", punctuation = NULL, overlapping = TRUE, reset = NULL, sep = " ", minLength = 1)
Arguments
n
length, in words, of each n-gram
ignoreCase
logical: if FALSE, the n-gram matching is case sensitive and
if TRUE, case is ignored during matching.
delimiter
character or string that divides one word from the next.
You can use a regular expression as the delimiter
value.
punctuation
a regular expression that specifies the punctuation characters
parser will remove before it evaluates the input text.
overlapping
logical: true value allows for overlapping n-grams.
reset
a regular expression listing one or more punctuation characters or
strings, any of which the nGram
parser will recognize as the end of a sentence
of text. The end of each sentence resets the search for n-grams, meaning that
nGram
discards any partial n-grams and proceeds to the next sentence to search
for the next n-gram. In other words, no n-gram can span two sentences.
sep
a character string to separate multiple text columns.
minLength
minimum length of words in ngram. Ngrams that contains words below
shorter than the limit are omitted. Current implementation is not complete: it
filters out ngrams where each word is below the minimum length, i.e. total length of
ngram is below n*minLength + (n-1).