Learn R Programming

⚠️There's a newer version (0.2.1) of this package.Take me there.

tok

tok provides bindings to the [

Copy Link

Version

Install

install.packages('tok')

Monthly Downloads

21,328

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Daniel Falbel

Last Published

September 30th, 2025

Functions in tok (0.2.0)

tok_trainer

Generic training class
tok_processor

Generic class for processors
trainer_bpe

BPE trainer
tokenizer

Tokenizer
tok_decoder

Generic class for decoders
processor_byte_level

Byte Level post processor
trainer_wordpiece

WordPiece tokenizer trainer
tok_normalizer

Generic class for normalizers
tok_model

Generic class for tokenization models
trainer_unigram

Unigram tokenizer trainer
normalizer_nfc

NFC normalizer
pre_tokenizer_byte_level

Byte level pre tokenizer
model_bpe

BPE model
normalizer_nfkc

NFKC normalizer
model_wordpiece

An implementation of the WordPiece algorithm
model_unigram

An implementation of the Unigram algorithm
pre_tokenizer_whitespace

This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+
pre_tokenizer

Generic class for tokenizers
encoding

Encoding
decoder_byte_level

Byte level decoder