R interfaces to Weka tokenizers.
AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)A character vector with the tokenized strings.
a character vector with strings to be tokenized.
an object of class Weka_control, or a
character vector of control options, or NULL (default).
Available options can be obtained on-line using the Weka Option
Wizard WOW, or the Weka documentation.
AlphabeticTokenizer is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer splits strings into \(n\)-grams with given
minimal and maximal numbers of grams.
WordTokenizer is a simple word tokenizer.