Learn R Programming

keyclust (version 1.2.5)

similarity_matrix: Algorithm designed to create a cosine similarity matrix from a fitted word embedding model

Description

This function takes a fitted word embedding model and computes the cosine similarity between each word.

Usage

similarity_matrix(x, words = NULL, max_terms = 25000)

Value

An N x N matrix of cosine similarity scores between words from a fitted word embedding model.

Arguments

x

A word embedding matrix

words

A vector of words or the name of a column that corresponds to the word dimension of the fitted word embeddings

max_terms

The maximum number of embedding terms that will be included in output similarity matrix. Assumes that embedding input is ordered by word frequency.

Examples

Run this code
# Create a set of keywords using a pre-defined set of seeds
simmat <- similarity_matrix(wordemb_FasttextEng_sample, words = "words")

Run the code above in your browser using DataLab