prepare_vocab

token_list

<p>We use a special named integer vector with class wordpiece_vocabulary to
provide information about tokens used in <code><a rd-options="" href="/link/wordpiece_tokenize?package=wordpiece&version=2.1.3" data-mini-rdoc="wordpiece::wordpiece_tokenize">wordpiece_tokenize</a></code>.
This function takes a character vector of tokens and puts it into that
format.</p>

Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text,
given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization
conventions are used by default.

Jonathan Bratt

wordpiece

R Implementation of Wordpiece Tokenization

Jon Harmon

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning 

prepare_vocab function

<p>We use a special named integer vector with class wordpiece_vocabulary to
provide information about tokens used in <code><a rd-options='' href='wordpiece_tokenize'>wordpiece_tokenize</a></code>.
This function takes a character vector of tokens and puts it into that
format.</p>

prepare_vocab: Format a Token List as a Vocabulary

Description

Usage

Arguments

Value

Examples