powered by
Represents the output of a tokenizer.
An encoding object containing encoding information such as attention masks and token ids.
.encoding
The underlying implementation pointer.
ids
The IDs are the main input to a Language Model. They are the token indices, the numerical representations that a LM understands.
attention_mask
The attention mask used as input for transformers models.
encoding$new()
encoding$clone()
new()
Initializes an encoding object (Not to use directly)
encoding$new(encoding)
encoding
an encoding implementation object
clone()
The objects of this class are cloneable with this method.
encoding$clone(deep = FALSE)
deep
Whether to make a deep clone.
withr::with_envvar(c(HUGGINGFACE_HUB_CACHE = tempdir()), { try({ tok <- tokenizer$from_pretrained("gpt2") encoding <- tok$encode("Hello world") encoding }) })
Run the code above in your browser using DataLab