Regexp_Tokenizer() creates regexp span tokenizers which use the
given pattern and ... arguments to match tokens or
separators between tokens via gregexpr(), and then
transform the results of this into character spans of the tokens
found.
whitespace_tokenizer() tokenizes by treating any sequence of
whitespace characters as a separator.
blankline_tokenizer() tokenizes by treating any sequence of
blank lines as a separator.
wordpunct_tokenizer() tokenizes by matching sequences of
alphabetic characters and sequences of (non-whitespace) non-alphabetic
characters.