This functions quantifies the bias in a set of word embeddings by Caliskan et al (2017). In comparison to WEAT introduced in the same paper, this method is more suitable for continuous ground truth data. See Figure 1 and Figure 2 of the original paper. If possible, please use query()
instead.
nas(w, S_words, A_words, B_words, verbose = FALSE)
A list with class "nas"
containing the following components:
$P
a vector of normalized association score for every word in S
$raw
a list of raw results used for calculating normalized association scores
$S_words
the input S_words
$A_words
the input A_words
$B_words
the input B_words
a numeric matrix of word embeddings, e.g. from read_word2vec()
a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...
a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.
a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.
logical, whether to display information
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. tools:::Rd_expr_doi("10.1126/science.aal4230")