BPE trainer
BPE trainer
tok::tok_trainer -> tok_trainer_bpe
new()Constrcutor for the BPE trainer
trainer_bpe$new(
vocab_size = NULL,
min_frequency = NULL,
show_progress = NULL,
special_tokens = NULL,
limit_alphabet = NULL,
initial_alphabet = NULL,
continuing_subword_prefix = NULL,
end_of_word_suffix = NULL,
max_token_length = NULL
)vocab_sizeThe size of the final vocabulary, including all tokens and alphabet.
Default: NULL.
min_frequencyThe minimum frequency a pair should have in order to be merged.
Default: NULL.
show_progressWhether to show progress bars while training. Default: TRUE.
special_tokensA list of special tokens the model should be aware of.
Default: NULL.
limit_alphabetThe maximum number of different characters to keep in the alphabet.
Default: NULL.
initial_alphabetA list of characters to include in the initial alphabet,
even if not seen in the training dataset. Default: NULL.
continuing_subword_prefixA prefix to be used for every subword that is not a beginning-of-word.
Default: NULL.
end_of_word_suffixA suffix to be used for every subword that is an end-of-word.
Default: NULL.
max_token_lengthPrevents creating tokens longer than the specified size.
Default: NULL.
clone()The objects of this class are cloneable with this method.
trainer_bpe$clone(deep = FALSE)deepWhether to make a deep clone.
Other trainer:
tok_trainer,
trainer_unigram,
trainer_wordpiece