Rdocumentation
powered by
Learn R Programming
tok
tok provides bindings to the [
Copy Link
Link to current version
Version
Version
0.2.1
0.2.0
0.1.4
0.1.3
0.1.2
0.1.1
0.1.0
Install
install.packages('tok')
Monthly Downloads
21,328
Version
0.2.1
License
MIT + file LICENSE
Issues
0
Pull Requests
1
Stars
46
Forks
2
Repository
https://github.com/mlverse/tok
Maintainer
Daniel Falbel
Last Published
September 30th, 2025
Functions in tok (0.2.1)
Search all functions
trainer_wordpiece
WordPiece tokenizer trainer
trainer_bpe
BPE trainer
tok_model
Generic class for tokenization models
tok_trainer
Generic training class
tok_processor
Generic class for processors
tok_decoder
Generic class for decoders
tok_normalizer
Generic class for normalizers
trainer_unigram
Unigram tokenizer trainer
processor_byte_level
Byte Level post processor
tokenizer
Tokenizer
model_bpe
BPE model
pre_tokenizer_whitespace
This pre-tokenizer simply splits using the following regex:
\w+|[^\w\s]+
pre_tokenizer
Generic class for tokenizers
pre_tokenizer_byte_level
Byte level pre tokenizer
normalizer_nfc
NFC normalizer
decoder_byte_level
Byte level decoder
model_wordpiece
An implementation of the WordPiece algorithm
model_unigram
An implementation of the Unigram algorithm
normalizer_nfkc
NFKC normalizer
encoding
Encoding