find_nns: Return nearest neighbors based on cosine similarity

Description

Return nearest neighbors based on cosine similarity

Usage

find_nns(
  target_embedding,
  pre_trained,
  N = 5,
  candidates = NULL,
  norm = "l2",
  stem = FALSE,
  language = "porter"
)

Value

(character) vector of nearest neighbors to target

Arguments

target_embedding

(numeric) 1 x D matrix. D = dimensions of pretrained embeddings.

pre_trained

(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding.

N

(numeric) number of nearest neighbors to return.

candidates

(character) vector of candidate features for nearest neighbors

norm

(character) - how to compute similarity (see ?text2vec::sim2):

"l2": cosine similarity

"none"

inner product

stem

(logical) - whether to stem candidates when evaluating nns. Default is FALSE. If TRUE, candidate stems are ranked by their average cosine similarity to the target. We recommend you remove misspelled words from candidate set candidates as these can significantly influence the average.

language

the name of a recognized language, as returned by getStemLanguages, or a two- or three-letter ISO-639 code corresponding to one of these languages (see references for the list of codes).

Examples

Run this code

find_nns(target_embedding = cr_glove_subset['immigration',],
         pre_trained = cr_glove_subset, N = 5,
         candidates = NULL, norm = "l2", stem = FALSE)

Run the code above in your browser using DataLab