SND: Semantic neighborhood density

Description

Returns semantic neighborhood with semantic neighborhood size and density

Usage

SND(x,n=NA,threshold=3.5,tvectors=tvectors)

Value

A list of three elements:

neighbors: A names numeric vector of all identified neighbors, with the names being these neighbors and the values their similarity to x
n_size: The number of neighbors as a numeric
SND: The semantic neighborhood density (SND) as a numeric

Arguments

x: a character vector of length(x) = 1 or a numeric of length=ncol(tvectors) vector with same dimensionality as the semantic space
n: if specified as a numeric, determines the size of the neighborhood as the n nearest words to x. If n=NA (default), the semantic neighborhood will be determined according to a similarity threshold (see threshold)
threshold: specifies the similarity threshold that determines if a word is counted as a neighbor for x, following the method by Buchanan et al. (2011) (see Description below)
tvectors: the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Author

Fritz Guenther

Details

There are two principle approaches to determine the semantic neighborhood of a target word:

Set an a priori size of the semantic neighborhood to a fixed value n (e.g., Marelli & Baroni, 2015). The n closest words to the target word are counted as its semantic neighbors. The semantic neighborhood size is then necessarily n; the semantic neighborhood density is the mean similarity between these neighbors and the target word (see also plausibility)
Determine the semantic neighborhood based on a similarity threshold; all words whose similarity to the target word exceeds this threshold are counted as its semantic neighbors (e.g., Buchanan, Westbury, & Burgess, 2001). First, the similarity between the target word and all words in the semantic space is computed. These similarities are then transformed into z-scores. Traditionally, the threshold is set to z = 3.5 (e.g., Buchanan, Westbury, & Burgess, 2001).

If a single target word is used as x, this target word itself (which always has a similarity of 1 to itself) is excluded from these computations so that it cannot be counted as its own neighbor

References

Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531-544.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122, 485-515.

Examples

Run this code

data(wonderland)

SND("cheshire",n=20,tvectors=wonderland)

SND("alice",threshold=2,tvectors=wonderland)

Run the code above in your browser using DataLab