.mp_tokenize_single_string: Tokenize an Input Word-by-word
Description
Tokenize an Input Word-by-word
Usage
.mp_tokenize_single_string(words, vocab, lookup, unk_token, max_chars)
Arguments
words
Character; a vector of words (generated by space-tokenizing a
single input).
vocab
A morphemepiece vocabulary.
lookup
A morphemepiece lookup table.
unk_token
Token to represent unknown words.
max_chars
Maximum length of word recognized.
Value
A named integer vector of tokenized words.