Learn R Programming

text (version 0.9.50)

textCentrality: Compute cosine semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

Description

Compute cosine semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

Usage

textCentrality(
  words,
  word_embeddings,
  single_word_embeddings = single_word_embeddings_df,
  aggregation = "mean",
  min_freq_words_test = 0
)

Arguments

words

Word or text variable to be plotted.

word_embeddings

Word embeddings from textEmbed for the words to be plotted (i.e., the aggregated word embeddings for the "words" variable).

single_word_embeddings

Word embeddings from textEmbed for individual words (i.e., the decontextualized word embeddings).

aggregation

Method to aggregate the word embeddings (default = "mean"; see also "min", "max" or "[CLS]").

min_freq_words_test

Option to select words that have at least occurred a specified number of times (default = 0); when creating the semantic similarity scores within cosine similarity.

Value

A dataframe with variables (e.g., including semantic similarity, frequencies) for the individual words that are used for the plotting in the textCentralityPlot function.

See Also

see textCentralityPlot textProjection

Examples

Run this code
# NOT RUN {
word_embeddings <- word_embeddings_4
data <- Language_based_assessment_data_8
df_for_plotting <- textCentrality(
  data$harmonywords,
  word_embeddings$harmonywords,
  word_embeddings$singlewords_we
)
df_for_plotting
# }

Run the code above in your browser using DataLab