Returns semantic neighborhood with semantic neighborhood size and density
SND(x,n=NA,threshold=3.5,tvectors=tvectors)A list of three elements:
neighbors: A names numeric vector of all identified neighbors, with the names being these neighbors and the values their similarity to x
n_size: The number of neighbors as a numeric
SND: The semantic neighborhood density (SND) as a numeric
a character vector of length(x) = 1 or a numeric of length=ncol(tvectors) vector with same dimensionality as the semantic space
if specified as a numeric, determines the size of the neighborhood as the n nearest words to x. If n=NA (default), the semantic neighborhood will be determined according to a similarity threshold (see threshold)
specifies the similarity threshold that determines if a word is counted as a neighbor for x, following the method by Buchanan et al. (2011) (see Description below)
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
Fritz Guenther
There are two principle approaches to determine the semantic neighborhood of a target word:
Set an a priori size of the semantic neighborhood to a fixed value n (e.g., Marelli & Baroni, 2015). The n closest words to the target word are counted as its semantic neighbors. The semantic neighborhood size is then necessarily n; the semantic neighborhood density is the mean similarity between these neighbors and the target word (see also plausibility)
Determine the semantic neighborhood based on a similarity threshold; all words whose similarity to the target word exceeds this threshold are counted as its semantic neighbors (e.g., Buchanan, Westbury, & Burgess, 2001). First, the similarity between the target word and all words in the semantic space is computed. These similarities are then transformed into z-scores. Traditionally, the threshold is set to z = 3.5 (e.g., Buchanan, Westbury, & Burgess, 2001).
If a single target word is used as x, this target word itself (which always has a similarity of 1 to itself) is excluded from these computations so that it cannot be counted as its own neighbor
Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531-544.
Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122, 485-515.
cosine,
plot_neighbors,
compose
data(wonderland)
SND("cheshire",n=20,tvectors=wonderland)
SND("alice",threshold=2,tvectors=wonderland)
Run the code above in your browser using DataLab