Learn R Programming

text (version 0.9.99.2)

textSimilarityTest: EXPERIMENTAL: Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

Description

EXPERIMENTAL: Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

Usage

textSimilarityTest(
  x,
  y,
  similarity_method = "cosine",
  Npermutations = 10000,
  method = "paired",
  center = FALSE,
  scale = FALSE,
  alternative = "greater",
  output.permutations = TRUE,
  N_cluster_nodes = 1,
  seed = 1001
)

Value

A list with a p-value, similarity score estimate and permuted values if output.permutations=TRUE.

Arguments

x

Set of word embeddings from textEmbed.

y

Set of word embeddings from textEmbed.

similarity_method

Character string describing type of measure to be computed; default is "cosine" (see also measures from textDistance (here computed as 1 - textDistance()) including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").

Npermutations

Number of permutations (default 10000).

method

Compute a "paired" or an "unpaired" test.

center

(boolean; from base::scale) If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.

scale

(boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise.

alternative

Use a two or one-sided test (select one of: "two_sided", "less", "greater").

output.permutations

If TRUE, returns permuted values in output.

N_cluster_nodes

Number of cluster nodes to use (more makes computation faster; see parallel package).

seed

Set different seed.

Examples

Run this code
x <- word_embeddings_4$texts$harmonywords
y <- word_embeddings_4$texts$satisfactionwords
textSimilarityTest(x,
  y,
  method = "paired",
  Npermutations = 100,
  N_cluster_nodes = 1,
  alternative = "two_sided"
)

Run the code above in your browser using DataLab