Learn R Programming

About

conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).

How to Install

install.packages("conText")

Datasets

To use conText you will need three objects:

A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
A set of (GloVe) pre-trained embeddings.
A transformation matrix specific to the pre-trained embeddings.

conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds). We provide an equivalent RDS file for the 2024 GloVe embeddings released in July 2025 (labeled _glove_2024.rds).

Quick Start Guides

Check out this Quick Start Guide to get going with conText (last updated: 07/28/2025).

Latest Updates

As noted in Rodriguez et al. (2023) (p. 1272), distance measures typically used to compare representations in high-dimensional space (such as embedding vectors) exhibit statistical bias. In Green et al. (2025), we explore the severity of this problem for text-as-data applications and provide and validate a bias correction for the squared Euclidean distance. We implement this estimator and other recommendations from the paper in the latest update to the conText() function. Please refer to the Bias in Distance Measures vignette for additional information and the Quick Start Guide for examples of how to use the new version of the function and a description of changes in the output.

Multilanguage Resources

For those working in languages other than English, we have a set of data and code resources here

Copy Link

Version

Install

install.packages('conText')

Monthly Downloads

231

Version

3.0.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/prodriguezsosa/conText

Maintainer

Sofia Avila

Last Published

September 3rd, 2025

Functions in conText (3.0.0)

Virtual class "dem" for a document-embedding matrix

cr_sample_corpus

Congressional Record sample corpus

Embedding regression

Contrast nearest neighbors

Compute the cosine similarity between one or more ALC embeddings and a set of features.

Randomly sample documents from a dem

Get context words (words within a symmetric window around the target word/phrase) sorrounding a user defined target.

Return nearest neighbors based on cosine similarity

get_grouped_similarity

Get averaged similarity scores between target word(s) and one or two vectors of candidate words.

Given a tokenized corpus, compute the cosine similarities of the resulting ALC embeddings and a defined set of features.

Given two feature-embedding-matrices, compute "parallel" cosine similarities between overlapping features.

Virtual class "fem" for a feature-embedding matrix

get_seq_cos_sim

Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Get the tokens of contexts sorrounding user defined patterns

Given a set of tokenized contexts, find the top N nearest contexts.

get_local_vocab

Identify words common to a collection of texts and a set of pretrained embeddings.

Find cosine similarities between target and candidate words

Create an feature-embedding matrix

Given a tokenized corpus and a set of candidate neighbors, find the top N nearest neighbors.

Given a corpus and a binary grouping variable, computes the ratio of cosine similarities over the union of their respective N nearest neighbors.

permute_contrast

Permute similarity and ratio computations

Plot output of get_nns_ratio()

prototypical_context

Find most "prototypical" contexts.

Given a set of embeddings and a set of candidate neighbors, find the top N nearest neighbors.

Computes the ratio of cosine similarities for two embeddings over the union of their respective top N nearest neighbors.

Embed target using either: (a) a la carte OR (b) simple (untransformed) averaging of context embeddings

bootstrap_contrast

Bootstrap similarity and ratio computations

compute_similarity

Compute similarity vector (sub-function of bootstrap_similarity)

bootstrap_similarity

Boostrap similarity vector

build a conText-class object

build a fem-class object

compute_transform

Compute transformation matrix A

compute_contrast

Compute similarity and similarity ratios

build a dem-class object

Transformation matrix

Build a document-embedding matrix

cr_glove_subset

Bootstrap nearest neighbors

conText-package

conText: 'a la Carte' on Text (ConText) Embedding Regression

Average document-embeddings in a dem by a grouping variable

Virtual class "conText" for a conText regression output