Unlimited learning, half price | 50% off
Get 50% off unlimited learning

About

conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).

How to Install

install.packages("conText")

Datasets

To use conText you will need three objects:

  1. A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
  2. A set of (GloVe) pre-trained embeddings.
  3. A transformation matrix specific to the pre-trained embeddings.

conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds).

Quick Start Guides

Check out this Quick Start Guide to get going with conText.

Copy Link

Version

Install

install.packages('conText')

Monthly Downloads

298

Version

1.4.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Pedro L. Rodriguez

Last Published

February 9th, 2023

Functions in conText (1.4.3)

bootstrap_nns

Bootstrap nearest neighbors
cr_glove_subset

GloVe subset
dem-class

Virtual class "dem" for a document-embedding matrix
cr_sample_corpus

Congressional Record sample corpus
cr_transform

Transformation matrix
dem_group

Average document-embeddings in a dem by a grouping variable
find_cos_sim

Find cosine similarities between target and candidate words
dem

Build a document-embedding matrix
conText

Embedding regression
conText-class

Virtual class "conText" for a conText regression output
fem

Create an feature-embedding matrix
get_cos_sim

Given a tokenized corpus, compute the cosine similarities of the resulting ALC embeddings and a defined set of features.
get_context

Get context words (words within a symmetric window around the target word/phrase) sorrounding a user defined target.
find_nns

Return nearest neighbors based on cosine similarity
fem-class

Virtual class "fem" for a feature-embedding matrix
contrast_nns

Contrast nearest neighbors
get_local_vocab

Identify words common to a collection of texts and a set of pretrained embeddings.
feature_sim

Given two feature-embedding-matrices, compute "parallel" cosine similarities between overlapping features.
cos_sim

Compute the cosine similarity between one or more ALC embeddings and a set of features.
embed_target

Embed target using either: (a) a la carte OR (b) simple (untransformed) averaging of context embeddings
permute_ols

Permute OLS
dem_sample

Randomly sample documents from a dem
nns_ratio

Computes the ratio of cosine similarities for two embeddings over the union of their respective top N nearest neighbors.
get_seq_cos_sim

Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach
permute_contrast

Permute similarity and ratio computations
get_nns_ratio

Given a corpus and a binary grouping variable, computes the ratio of cosine similarities over the union of their respective N nearest neighbors.
plot_nns_ratio

Plot output of get_nns_ratio()
ncs

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.
tokens_context

Get the tokens of contexts sorrounding user defined patterns
nns

Given a set of embeddings and a set of candidate neighbors, find the top N nearest neighbors.
get_nns

Given a tokenized corpus and a set of candidate neighbors, find the top N nearest neighbors.
run_ols

Run OLS
prototypical_context

Find most "prototypical" contexts.
get_ncs

Given a set of tokenized contexts, find the top N nearest contexts.
bootstrap_similarity

Boostrap similarity vector
bootstrap_ols

Bootstrap OLS
build_fem

build a fem-class object
compute_similarity

Compute similarity vector (sub-function of bootstrap_similarity)
compute_contrast

Compute similarity and similarity ratios
compute_transform

Compute transformation matrix A
build_dem

build a dem-class object
build_conText

build a conText-class object
bootstrap_contrast

Bootstrap similarity and ratio computations