morphemepiece_tokenize

Character scalar; text to tokenize.

text

vocab

lookup

unk_token

Maximum length of word recognized.

max_chars

Given a single sequence of text and a morphemepiece vocabulary, tokenizes the
text.

Tokenize text into morphemes. The morphemepiece algorithm uses a
lookup table to determine the morpheme breakdown of words, and falls back on a
modified wordpiece tokenization algorithm for words not found in the lookup
table.

morphemepiece_tokenize: Tokenize Sequence with Morpheme Pieces

Description

Usage

Arguments

Value