A character vector, or an object that can be coerced to character by
as.character.
Value
A character vector consisting of tokens obtained by tokenization of x.
Details
The quality and correctness of a tokenization algorithm highly depends
on the context and application scenario. Relevant factors are the
language of the underlying text and the notions of whitespace (which
can vary with the used encoding and the language) and punctuation
marks. Consequently, for superior results you probably need a custom
tokenization function.
[object Object],[object Object]
See Also
getTokenizers to list tokenizers provided by package tm.
Regexp_Tokenizer for tokenizers using regular expressions
provided by package NLP.
tokenize for a simple regular expression based tokenizer
provided by package tau.