Object of class R6 which stores the text embeddings
generated by an object of class TextEmbeddingModel via the method
embed()
.
Returns an object of class EmbeddedText
. These objects are used
for storing and managing the text embeddings created with objects of class TextEmbeddingModel.
Objects of class EmbeddedText
serve as input for classifiers of class
TextEmbeddingClassifierNeuralNet. The main aim of this class is to provide a structured link between
embedding models and classifiers. Since objects of this class save information on
the text embedding model that created the text embedding it ensures that only
embedding generated with same embedding model are combined. Furthermore, the stored information allows
classifiers to check if embeddings of the correct text embedding model are used for
training and predicting.
embeddings
('data.frame()')
data.frame containing the text embeddings for all chunks. Documents are
in the rows. Embedding dimensions are in the columns.
new()
Creates a new object representing text embeddings.
EmbeddedText$new(
model_name = NA,
model_label = NA,
model_date = NA,
model_method = NA,
model_version = NA,
model_language = NA,
param_seq_length = NA,
param_chunks = NULL,
param_overlap = NULL,
param_emb_layer_min = NULL,
param_emb_layer_max = NULL,
param_emb_pool_type = NULL,
param_aggregation = NULL,
embeddings
)
model_name
string
Name of the model that generates this embedding.
model_label
string
Label of the model that generates this embedding.
model_date
string
Date when the embedding generating model was created.
model_method
string
Method of the underlying embedding model.
model_version
string
Version of the model that generated this embedding.
model_language
string
Language of the model that generated this embedding.
param_seq_length
int
Maximum number of tokens that processes the generating model for a chunk.
param_chunks
int
Maximum number of chunks which are supported by the generating model.
param_overlap
int
Number of tokens that were added at the beginning of the sequence for the next chunk
by this model.
param_emb_layer_min
int
or string
determining the first layer to be included
in the creation of embeddings.
param_emb_layer_max
int
or string
determining the last layer to be included
in the creation of embeddings.
param_emb_pool_type
string
determining the method for pooling the token embeddings
within each layer.
param_aggregation
string
Aggregation method of the hidden states. Deprecated. Only included
for backward compatibility.
embeddings
data.frame
containing the text embeddings.
Returns an object of class EmbeddedText which stores the text embeddings produced by an objects of class TextEmbeddingModel. The object serves as input for objects of class TextEmbeddingClassifierNeuralNet.
get_model_info()
Method for retrieving information about the model that generated this embedding.
EmbeddedText$get_model_info()
list
contains all saved information about the underlying
text embedding model.
get_model_label()
Method for retrieving the label of the model that generated this embedding.
EmbeddedText$get_model_label()
string
Label of the corresponding text embedding model
clone()
The objects of this class are cloneable with this method.
EmbeddedText$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Text Embedding:
TextEmbeddingModel
,
combine_embeddings()