Usage
token(n, tokenSep = "+", ignoreCase = FALSE, delimiter = "[ \\t\\b\\f\\r]+", punctuation = NULL, stemming = FALSE, stopWords = FALSE, sep = " ", minLength = 1)
Arguments
tokenSep
a character string to separate the tokens when n > 1
ignoreCase
logical: treat text as-is (FALSE
) or convert to all lowercase
(true); Default is TRUE
. Note that if the stemming
is set to
TRUE
, tokens will always be converted to lowercase, so this option
will be ignored.
delimiter
character or string that divides one word from the next.
You can use a regular expression as the delimiter
value.
punctuation
a regular expression that specifies the punctuation characters
parser will remove before it evaluates the input text.
stemming
logical: If true, apply Porter2 Stemming to each token to reduce
it to its root form. Default is FALSE
.
stopWords
logical or string with the name of the file that contains stop words.
If TRUE then that should
be ignored when parsing text. Each stop word is specified on a separate line.
sep
a character string to separate multiple text columns.
minLength
exclude tokens shorter than minLength characters.