Learn R Programming

sweater (version 0.1.8)

ect: Embedding Coherence Test

Description

This function estimate the Embedding Coherence Test (ECT) of word embeddings (Dev & Philips, 2019). If possible, please use query() instead.

Usage

ect(w, S_words, A_words, B_words, verbose = FALSE)

Value

A list with class "ect" containing the following components:

  • $A_words the input A_words

  • $B_words the input B_words

  • $S_words the input S_words

  • $u_a Cosine similarity between each word vector of S_words and average vector of A_words

  • $u_b Cosine similarity between each word vector of S_words and average vector of B_words

Arguments

w

a numeric matrix of word embeddings, e.g. from read_word2vec()

S_words

a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...

A_words

a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.

B_words

a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.

verbose

logical, whether to display information

References

Dev, S., & Phillips, J. (2019, April). Attenuating bias in word vectors. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 879-887). PMLR.

See Also

ect_es() can be used to obtain the effect size of the test. plot_ect() can be used to visualize the result.

Examples

Run this code
data(googlenews)
S1 <- c("janitor", "statistician", "midwife", "bailiff", "auctioneer",
"photographer", "geologist", "shoemaker", "athlete", "cashier", "dancer",
"housekeeper", "accountant", "physicist", "gardener", "dentist", "weaver",
"blacksmith", "psychologist", "supervisor", "mathematician", "surveyor",
"tailor", "designer", "economist", "mechanic", "laborer", "postmaster",
"broker", "chemist", "librarian", "attendant", "clerical", "musician",
"porter", "scientist", "carpenter", "sailor", "instructor", "sheriff",
"pilot", "inspector", "mason", "baker", "administrator", "architect",
"collector", "operator", "surgeon", "driver", "painter", "conductor",
"nurse", "cook", "engineer", "retired", "sales", "lawyer", "clergy",
"physician", "farmer", "clerk", "manager", "guard", "artist", "smith",
"official", "police", "doctor", "professor", "student", "judge",
"teacher", "author", "secretary", "soldier")
A1 <- c("he", "son", "his", "him", "father", "man", "boy", "himself",
"male", "brother", "sons", "fathers", "men", "boys", "males", "brothers",
"uncle", "uncles", "nephew", "nephews")
B1 <- c("she", "daughter", "hers", "her", "mother", "woman", "girl",
"herself", "female", "sister", "daughters", "mothers", "women", "girls",
"females", "sisters", "aunt", "aunts", "niece", "nieces")
garg_f1 <- ect(googlenews, S1, A1, B1)
plot_ect(garg_f1)

Run the code above in your browser using DataLab