R interfaces to Weka tokenizers.
AlphabeticTokenizer(x, control = NULL) NGramTokenizer(x, control = NULL) WordTokenizer(x, control = NULL)
a character vector with strings to be tokenized.
A character vector with the tokenized strings.
AlphabeticTokenizer is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer splits strings into \(n\)-grams with given
minimal and maximal numbers of grams.
WordTokenizer is a simple word tokenizer.