Learn R Programming

LLMing (version 1.1.0)

embed: Embed texts with a Transformer model

Description

Cleans a text column and converts it to a dataframe of numeric vectors via BERT embeddings. For the input dataframe, each row is one text entry.

Usage

embed(dat, layers, keep_tokens = TRUE, tokens_method = NULL)

Value

A dataframe where each row corresponds to one input text and each column is an embedding dimension

@examples df <- data.frame( text = c( "I slept well and feel great today!", "I saw from friends and it went well.", "I think I failed that exam. I'm such a disappointment." "I think I failed that exam. I'm such a disapointment." ) )

emb_dat <- embed( dat = df, layers = 1, keep_tokens = FALSE, tokens_method = "mean" )

Arguments

dat

A dataframe with text data, one text per row

layers

Integer vector specifying which model layers to aggregate from.

keep_tokens

Logical, keep token-level embeddings in the returned object or discard them to save memory

tokens_method

Character scalar controlling how token-level embeddings are aggregated to word types