doc_similarity

Given a document-term matrix (DTM) this function returns the
similarities between documents using a specified method (see details).
The result is a square document-by-document similarity matrix (DSM),
equivalent to a weighted adjacency matrix in network analysis.

This is a collection of functions optimized for working with
with various kinds of text matrices. Focusing on
the text matrix as the primary object - represented
either as a base R dense matrix or a 'Matrix' package sparse
matrix - allows for a consistent and intuitive interface
that stays close to the underlying mathematical foundation
of computational text analysis. In particular, the package
includes functions for working with word embeddings,
text networks, and document-term matrices. Methods developed in
Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>,
Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>,
Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and
Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.

Dustin Stoltz

text2map

R Tools for Text Matrices, Embeddings, and Networks

Marshall Taylor

doc_similarity function

<dl><dt>x</dt>
<dd>Document-term matrix with terms as columns.</dd>
<dt>y</dt>
<dd>Optional second matrix (default = <code>NULL</code>).</dd>
<dt>method</dt>
<dd>Character vector indicating similarity method, including
projection, cosine, wmd, and centroid (see Details).</dd>
<dt>wv</dt>
<dd>Matrix of word embedding vectors (a.k.a embedding model)
with rows as words. Required for "wmd" and "centroid"
similarities.</dd></dl>

Arguments

Author

Find a similarities between documents — doc_similarity

<dl>

<dt>x</dt>
<dd>Document-term matrix with terms as columns.</dd>


<dt>y</dt>
<dd>Optional second matrix (default = <code>NULL</code>).</dd>


<dt>method</dt>
<dd>Character vector indicating similarity method, including
projection, cosine, wmd, and centroid (see Details).</dd>


<dt>wv</dt>
<dd>Matrix of word embedding vectors (a.k.a embedding model)
with rows as words. Required for "wmd" and "centroid"
similarities.</dd>

</dl>

doc_similarity: Find a similarities between documents

Description

Usage

Arguments

Author

Details

Examples