powered by
(I'm not sure that this object-based approach is best for R implementation, but for now just trying to reproduce python functionality.)
WordpieceTokenizer(vocab, unk_token = "[UNK]", max_input_chars_per_word = 200)
Recognized vocabulary tokens, as a named integer vector. (Name is token, value is index.)
Token to use for unknown words.
Length of longest word we will recognize.
an object of class WordpieceTokenizer
Has method: tokenize.WordpieceTokenizer()
# NOT RUN { vocab <- load_vocab(vocab_file = "vocab.txt") wp_tokenizer <- WordpieceTokenizer(vocab) # }
Run the code above in your browser using DataLab