textTrain

Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Oscar Kjell

text

Analyses of Text using Transformers Models from HuggingFace,
Natural Language Processing and Machine Learning

Salvatore Giorgi

Andrew Schwartz

textTrain function

<dl><dt>x</dt>
<dd>Word embeddings from textEmbed (or textEmbedLayerAggreation).
Can analyze several variables at the same time; but if training to several
outcomes at the same time use a tibble within the list as input rather than just a
tibble input (i.e., keep the name of the wordembedding).</dd>
<dt>y</dt>
<dd>Numeric variable to predict. Can be several; although then make
sure to have them within a tibble (this is required
even if it is only one outcome but several word embeddings variables).</dd>
<dt>force_train_method</dt>
<dd>default is "automatic", so if y is a factor
random_forest is used, and if y is numeric ridge regression
is used. This can be overridden using "regression" or "random_forest".</dd>
<dt>...</dt>
<dd>Arguments from textTrainRegression or textTrainRandomForest
the textTrain function.</dd></dl>

Arguments

Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable. — textTrain

<dl>

<dt>x</dt>
<dd>Word embeddings from textEmbed (or textEmbedLayerAggreation).
Can analyze several variables at the same time; but if training to several
outcomes at the same time use a tibble within the list as input rather than just a
tibble input (i.e., keep the name of the wordembedding).</dd>


<dt>y</dt>
<dd>Numeric variable to predict. Can be several; although then make
sure to have them within a tibble (this is required
even if it is only one outcome but several word embeddings variables).</dd>


<dt>force_train_method</dt>
<dd>default is "automatic", so if y is a factor
random_forest is used, and if y is numeric ridge regression
is used. This can be overridden using "regression" or "random_forest".</dd>


<dt>...</dt>
<dd>Arguments from textTrainRegression or textTrainRandomForest
the textTrain function.</dd>

</dl>

textTrain: Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.

Description

Usage

Value

Arguments

See Also

Examples