textCentrality: Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

Description

Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

Usage

textCentrality(
  words,
  word_embeddings,
  word_types_embeddings = word_types_embeddings_df,
  method = "cosine",
  aggregation = "mean",
  min_freq_words_test = 0
)

Value

A dataframe with variables (e.g., including semantic similarity, frequencies) for the individual words that are used for the plotting in the textCentralityPlot function.

Arguments

words: Word or text variable to be plotted.
word_embeddings: Word embeddings from textEmbed for the words to be plotted (i.e., the aggregated word embeddings for the "words" variable).
word_types_embeddings: Word embeddings from textEmbed for individual words (i.e., the decontextualized word embeddings).
method: Character string describing type of measure to be computed. Default is "cosine" (see also "spearmen", "pearson" as well as measures from textDistance() (which here is computed as 1 - textDistance) including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").
aggregation: Method to aggregate the word embeddings (default = "mean"; see also "min", "max" or "[CLS]").
min_freq_words_test: Option to select words that have at least occurred a specified number of times (default = 0); when creating the semantic similarity scores.

Examples

Run this code

if (FALSE) {
df_for_plotting <- textCentrality(
  words = Language_based_assessment_data_8$harmonywords,
  word_embeddings = word_embeddings_4$texts$harmonywords,
  word_types_embeddings = word_embeddings_4$word_types
)
df_for_plotting
}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples