Character vector of vocabulary tokens. The tokens are assumed to
be in order of index, with the first index taken as zero to be compatible
with Python implementations.
unk_token
Token to represent unknown words.
max_chars
Maximum length of word recognized.
Value
A list of named integer vectors, giving the tokenization of the input
sequences. The integer values are the token ids, and the names are the
tokens.