- x
a (quanteda) corpus object
- target
(character) vector of words
- first_vec
(character) vector of words
- second_vec
(character) vector of words
- pre_trained
(numeric) a F x D matrix corresponding to pretrained embeddings,
usually trained on the same corpus as that used for x.
F = number of features and D = embedding dimensions.
rownames(pre_trained) = set of features for which there is a pre-trained embedding
- transform_matrix
(numeric) a D x D 'a la carte' transformation matrix.
D = dimensions of pretrained embeddings.
- group_var
(character) variable name in corpus object defining grouping variable
- window
(numeric) - defines the size of a context (words around the target)
- norm
(character) - "l2" for l2 normalized cosine similarity and "none" for dot product
- remove_punct
(logical) - if TRUE remove all characters in the Unicode
"Punctuation" [P] class
- remove_symbols
(logical) - if TRUE remove all characters in the Unicode
"Symbol" [S] class
- remove_numbers
(logical) - if TRUE remove tokens that consist only of
numbers, but not words that start with digits, e.g. 2day
- remove_separators
(logical) - if TRUE remove separators and separator
characters (Unicode "Separator" [Z] and "Control" [C] categories)
- valuetype
the type of pattern matching: "glob" for "glob"-style
wildcard expressions; "regex" for regular expressions; or "fixed" for
exact matching
- hard_cut
(logical) - if TRUE then a context must have window x 2 tokens,
if FALSE it can have window x 2 or fewer (e.g. if a doc begins with a target word,
then context will have window tokens rather than window x 2)
- case_insensitive
(logical) - if TRUE, ignore case when matching a
target patter