Learn R Programming

sweater (version 0.1.8)

weat: Speedy Word Embedding Association Test

Description

This functions test the bias in a set of word embeddings using the method by Caliskan et al (2017). If possible, please use query() instead.

Usage

weat(w, S_words, T_words, A_words, B_words, verbose = FALSE)

Value

A list with class "weat" containing the following components:

  • $S_diff for each of words in S_words, mean of the mean differences in cosine similarity between words in A_words and words in B_words

  • $T_diff for each of words in T_words, mean of the mean differences in cosine similarity between words in A_words and words in B_words

  • $S_words the input S_words

  • $T_words the input T_words

  • $A_words the input A_words

  • $B_words the input B_words weat_es() can be used to obtain the effect size of the test; weat_resampling() for a test of significance.

Arguments

w

a numeric matrix of word embeddings, e.g. from read_word2vec()

S_words

a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...

T_words

a character vector of the second set of target words. In an example of studying gender stereotype, it can include occupations such as nurse, teacher, librarian...

A_words

a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.

B_words

a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.

verbose

logical, whether to display information

References

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. tools:::Rd_expr_doi("10.1126/science.aal4230")

Examples

Run this code
# Reproduce the number in Caliskan et al. (2017) - Table 1, "Math vs. Arts"
data(glove_math)
S1 <- c("math", "algebra", "geometry", "calculus", "equations",
"computation", "numbers", "addition")
T1 <- c("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture")
A1 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B1 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")
sw <- weat(glove_math, S1, T1, A1, B1)
weat_es(sw)

Run the code above in your browser using DataLab