quanteda (version 1.5.2)

textstat_proxy: [Experimental] Compute document/feature proximity

Description

This is an underlying function for textstat_dist and textstat_simil but returns TsparseMatrix.

Usage

textstat_proxy(
  x,
  y = NULL,
  margin = c("documents", "features"),
  method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman",
    "simple matching", "euclidean", "chisquared", "hamming", "kullback", "manhattan",
    "maximum", "canberra", "minkowski"),
  p = 2,
  min_proxy = NULL,
  rank = NULL,
  use_na = FALSE
)

Arguments

x

a dfm objects; y is an optional target matrix matching x in the margin on which the similarity or distance will be computed.

y

if a dfm object is provided, proximity between documents or features in x and y is computed.

margin

identifies the margin of the dfm on which similarity or difference will be computed: "documents" for documents or "features" for word/term features.

method

character; the method identifying the similarity or distance measure to be used; see Details.

p

The power of the Minkowski distance.

min_proxy

the minimum proximity value to be recoded.

rank

an integer value specifying top-n most proximity values to be recorded.

use_na

if TRUE, return NA for proximity to empty vectors. Note that use of NA makes the proximity matrices denser.

See Also

textstat_dist, textstat_simil