embed

Cleans a text column and converts it to a dataframe of numeric vectors via
BERT embeddings. For the input dataframe, each row
is one text entry.

A collection of large language model (LLM) text analysis methods
designed with psychological data in mind. Currently, LLMing (aka "lemming")
includes a text anomaly detection method based on the angle-based subspace
approach described by Zhang, Lin, and Karim (2015) and a text generation method.
<doi:10.1016/j.ress.2015.05.025>.

Lindley Slipetz

LLMing

Large Language Model (LLM) Tools for Psychological Text Analysis

Teague Henry

Siqi Sun

embed function

<dl><dt>dat</dt>
<dd>A dataframe with text data, one text per row</dd>
<dt>layers</dt>
<dd>Integer vector specifying which model layers to aggregate from.</dd>
<dt>keep_tokens</dt>
<dd>Logical, keep token-level embeddings in the returned
object or discard them to save memory</dd>
<dt>tokens_method</dt>
<dd>Character scalar controlling how token-level
embeddings are aggregated to word types</dd></dl>

Arguments

Embed texts with a Transformer model — embed

<dl>

<dt>dat</dt>
<dd>A dataframe with text data, one text per row</dd>


<dt>layers</dt>
<dd>Integer vector specifying which model layers to aggregate from.</dd>


<dt>keep_tokens</dt>
<dd>Logical, keep token-level embeddings in the returned
object or discard them to save memory</dd>


<dt>tokens_method</dt>
<dd>Character scalar controlling how token-level
embeddings are aggregated to word types</dd>

</dl>

embed: Embed texts with a Transformer model

Description

Usage

Value

Arguments