Learn R Programming

⚠️There's a newer version (3.0.0) of this package.Take me there.

About

conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).

How to Install

install.packages("conText")

Datasets

To use conText you will need three objects:

A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
A set of (GloVe) pre-trained embeddings.
A transformation matrix specific to the pre-trained embeddings.

conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds).

Quick Start Guides

Check out this Quick Start Guide to get going with conText.

Copy Link

Version

Install

install.packages('conText')

Monthly Downloads

338

Version

1.4.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/prodriguezsosa/EmbeddingRegression

Maintainer

Pedro L. Rodriguez

Last Published

February 9th, 2023

Functions in conText (1.4.3)

Bootstrap nearest neighbors

cr_glove_subset

Virtual class "dem" for a document-embedding matrix

cr_sample_corpus

Congressional Record sample corpus

Transformation matrix

Average document-embeddings in a dem by a grouping variable

Find cosine similarities between target and candidate words

Build a document-embedding matrix

Embedding regression

Virtual class "conText" for a conText regression output

Create an feature-embedding matrix

Given a tokenized corpus, compute the cosine similarities of the resulting ALC embeddings and a defined set of features.

Get context words (words within a symmetric window around the target word/phrase) sorrounding a user defined target.

Return nearest neighbors based on cosine similarity

Virtual class "fem" for a feature-embedding matrix

Contrast nearest neighbors

get_local_vocab

Identify words common to a collection of texts and a set of pretrained embeddings.

Given two feature-embedding-matrices, compute "parallel" cosine similarities between overlapping features.

Compute the cosine similarity between one or more ALC embeddings and a set of features.

Embed target using either: (a) a la carte OR (b) simple (untransformed) averaging of context embeddings

Randomly sample documents from a dem

Computes the ratio of cosine similarities for two embeddings over the union of their respective top N nearest neighbors.

get_seq_cos_sim

Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach

permute_contrast

Permute similarity and ratio computations

Given a corpus and a binary grouping variable, computes the ratio of cosine similarities over the union of their respective N nearest neighbors.

Plot output of get_nns_ratio()

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Get the tokens of contexts sorrounding user defined patterns

Given a set of embeddings and a set of candidate neighbors, find the top N nearest neighbors.

Given a tokenized corpus and a set of candidate neighbors, find the top N nearest neighbors.

prototypical_context

Find most "prototypical" contexts.

Given a set of tokenized contexts, find the top N nearest contexts.

bootstrap_similarity

Boostrap similarity vector

build a fem-class object

compute_similarity

Compute similarity vector (sub-function of bootstrap_similarity)

compute_contrast

Compute similarity and similarity ratios

compute_transform

Compute transformation matrix A

build a dem-class object

build a conText-class object

bootstrap_contrast

Bootstrap similarity and ratio computations