Learn R Programming

tok (version 0.2.1)

encoding: Encoding

Description

Represents the output of a tokenizer.

Arguments

Value

An encoding object containing encoding information such as attention masks and token ids.

Public fields

.encoding

The underlying implementation pointer.

Active bindings

ids

The IDs are the main input to a Language Model. They are the token indices, the numerical representations that a LM understands.

attention_mask

The attention mask used as input for transformers models.

Methods


Method new()

Initializes an encoding object (Not to use directly)

Usage

encoding$new(encoding)

Arguments

encoding

an encoding implementation object


Method clone()

The objects of this class are cloneable with this method.

Usage

encoding$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

Run this code
withr::with_envvar(c(HUGGINGFACE_HUB_CACHE = tempdir()), {
try({
tok <- tokenizer$from_pretrained("gpt2")
encoding <- tok$encode("Hello world")
encoding
})
})

Run the code above in your browser using DataLab