ncs: Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Description

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Usage

ncs(x, contexts_dem, contexts = NULL, N = 5, as_list = TRUE)

Value

a data.frame or list of data.frames (one for each target) with the following columns:

target: (character) rownames of x, the labels of the ALC embeddings. NA if is.null(rownames(x)).
context: (character) contexts collapsed into single documents (i.e. untokenized). If contexts is NULL then this variable will show the context (document) ids which you can use to merge.
rank: (character) rank of context in terms of similarity with x.
value: (numeric) cosine similarity between x and context.

Arguments

x: a (quanteda) dem-class or fem-class object.
contexts_dem: a dem-class object corresponding to the ALC embeddings of candidate contexts.
contexts: a (quanteda) tokens-class object of tokenized candidate contexts. Note, these must correspond to the same contexts in contexts_dem. If NULL, then the context (document) ids will be output instead of the text.
N: (numeric) number of nearest contexts to return
as_list: (logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per embedding

Examples

Run this code


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*",
window = 6L, rm_keyword = FALSE)

# build document-feature matrix
immig_dfm <- dfm(immig_toks)

# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)

# to get group-specific embeddings, average within party
immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party)

# find nearest contexts by party
# setting as_list = FALSE combines each group's
# results into a single data.frame (useful for joint plotting)
ncs(x = immig_wv_party, contexts_dem = immig_dem,
contexts = immig_toks, N = 5, as_list = TRUE)

Run the code above in your browser using DataLab