prepare_vocab

token_list

<p>We use a character vector with class morphemepiece_vocabulary to provide
information about tokens used in
<code><a rd-options="" href="/link/morphemepiece_tokenize?package=morphemepiece&version=1.2.3" data-mini-rdoc="morphemepiece::morphemepiece_tokenize">morphemepiece_tokenize</a></code>. This function takes a character vector
of tokens and puts it into that format.</p>

Tokenize text into morphemes. The morphemepiece algorithm uses a
lookup table to determine the morpheme breakdown of words, and falls back on a
modified wordpiece tokenization algorithm for words not found in the lookup
table.

Jonathan Bratt

morphemepiece

Morpheme Tokenization

Jon Harmon

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning 

prepare_vocab function

<p>We use a character vector with class morphemepiece_vocabulary to provide
information about tokens used in
<code><a rd-options='' href='morphemepiece_tokenize'>morphemepiece_tokenize</a></code>. This function takes a character vector
of tokens and puts it into that format.</p>

prepare_vocab: Format a Token List as a Vocabulary

Description

Usage

Arguments

Value

Examples