get_centroid

The function outputs an averaged vector from a set of anchor terms' word
vectors. This average is roughly equivalent to the intersection of the
contexts in which each word is used. This semantic centroid can be used
for a variety of ends, and specifically as input to <code>CMDist()</code>.
<code>get_centroid()</code> requires a list of terms, string of terms, data.frame
or matrix. In the latter two cases, the first column will be used. The
vectors are aggregated using the simple average. Terms can be repeated,
and are therefore "weighted" by their counts.

This is a collection of functions optimized for working with
with various kinds of text matrices. Focusing on
the text matrix as the primary object - represented
either as a base R dense matrix or a 'Matrix' package sparse
matrix - allows for a consistent and intuitive interface
that stays close to the underlying mathematical foundation
of computational text analysis. In particular, the package
includes functions for working with word embeddings,
text networks, and document-term matrices. Methods developed in
Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>,
Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>,
Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and
Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.

Dustin Stoltz

text2map

R Tools for Text Matrices, Embeddings, and Networks

Marshall Taylor

get_centroid function

<dl><dt>anchors</dt>
<dd>List of terms to be averaged</dd>
<dt>wv</dt>
<dd>Matrix of word embedding vectors (a.k.a embedding model)
with rows as words.</dd>
<dt>missing</dt>
<dd>what action to take if terms are not in embeddings.
If action = "stop" (default), the function is stopped
and an error messages states which terms are missing.
If action = "remove", missing terms or rows with missing
terms are removed. Missing terms will be printed as a message.</dd></dl>

Arguments

Author

Word embedding semantic centroid extractor — get_centroid

<dl>

<dt>anchors</dt>
<dd>List of terms to be averaged</dd>


<dt>wv</dt>
<dd>Matrix of word embedding vectors (a.k.a embedding model)
with rows as words.</dd>


<dt>missing</dt>
<dd>what action to take if terms are not in embeddings.
If action = "stop" (default), the function is stopped
and an error messages states which terms are missing.
If action = "remove", missing terms or rows with missing
terms are removed. Missing terms will be printed as a message.</dd>

</dl>

get_centroid: Word embedding semantic centroid extractor

Description

Usage

Value

Arguments

Author

Examples