Tokenize an Input Word-by-word
.wp_tokenize_single_string(words, vocab, unk_token, max_chars)Character; a vector of words (generated by space-tokenizing a single input).
Character vector of vocabulary tokens. The tokens are assumed to be in order of index, with the first index taken as zero to be compatible with Python implementations.
Token to represent unknown words.
Maximum length of word recognized.
A named integer vector of tokenized words.