textSimilarityTest: EXPERIMENTAL: Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

Description

EXPERIMENTAL: Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

Usage

textSimilarityTest(
  x,
  y,
  similarity_method = "cosine",
  Npermutations = 10000,
  method = "paired",
  center = FALSE,
  scale = FALSE,
  alternative = "greater",
  output.permutations = TRUE,
  N_cluster_nodes = 1,
  seed = 1001
)

Value

A list with a p-value, similarity score estimate and permuted values if output.permutations=TRUE.

Arguments

x: Set of word embeddings from textEmbed.
y: Set of word embeddings from textEmbed.
similarity_method: Character string describing type of measure to be computed; default is "cosine" (see also measures from textDistance (here computed as 1 - textDistance()) including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").
Npermutations: Number of permutations (default 10000).
method: Compute a "paired" or an "unpaired" test.
center: (boolean; from base::scale) If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.
scale: (boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise.
alternative: Use a two or one-sided test (select one of: "two_sided", "less", "greater").
output.permutations: If TRUE, returns permuted values in output.
N_cluster_nodes: Number of cluster nodes to use (more makes computation faster; see parallel package).
seed: Set different seed.

Examples

Run this code

x <- word_embeddings_4$texts$harmonywords
y <- word_embeddings_4$texts$satisfactionwords
textSimilarityTest(x,
  y,
  method = "paired",
  Npermutations = 100,
  N_cluster_nodes = 1,
  alternative = "two_sided"
)

Run the code above in your browser using DataLab