textSimilarityMatrix

Compute semantic similarity scores between all combinations in a word embedding

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Oscar Kjell

text

Analyses of Text using Transformers Models from HuggingFace,
Natural Language Processing and Machine Learning

Salvatore Giorgi

Andrew Schwartz

textSimilarityMatrix function

<dl><dt>x</dt>
<dd>Word embeddings from textEmbed.</dd>
<dt>method</dt>
<dd>Character string describing type of measure to be computed. Default is "cosine" (see also
"spearmen", "pearson" as well as measures from textDistance() (which here is computed as 1 - textDistance)
including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").</dd>
<dt>center</dt>
<dd>(boolean; from base::scale) If center is TRUE then centering is done by subtracting the column means
(omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.</dd>
<dt>scale</dt>
<dd>(boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered)
columns of x by their standard deviations if center is TRUE, and the root mean square otherwise.</dd></dl>

Arguments

Compute semantic similarity scores between all combinations in a word embedding — textSimilarityMatrix

<dl>

<dt>x</dt>
<dd>Word embeddings from textEmbed.</dd>


<dt>method</dt>
<dd>Character string describing type of measure to be computed. Default is "cosine" (see also
"spearmen", "pearson" as well as measures from textDistance() (which here is computed as 1 - textDistance)
including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").</dd>


<dt>center</dt>
<dd>(boolean; from base::scale) If center is TRUE then centering is done by subtracting the column means
(omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.</dd>


<dt>scale</dt>
<dd>(boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered)
columns of x by their standard deviations if center is TRUE, and the root mean square otherwise.</dd>

</dl>

textSimilarityMatrix: Compute semantic similarity scores between all combinations in a word embedding

Description

Usage

Value

Arguments

See Also

Examples