Given a single sequence of text and a morphemepiece vocabulary, tokenizes the text.
morphemepiece_tokenize(
text,
vocab = morphemepiece_vocab(),
lookup = morphemepiece_lookup(),
unk_token = "[UNK]",
max_chars = 100
)Character scalar; text to tokenize.
A morphemepiece vocabulary.
A morphemepiece lookup table.
Token to represent unknown words.
Maximum length of word recognized.
A character vector of tokenized text (later, this should be a named integer vector, as in the wordpiece package.)