GloVe: Global Vectors

Description

Creates Global Vectors matrix factorization model

Arguments

Public fields

components: represents context embeddings
bias_i: bias term i as per paper
bias_j: bias term j as per paper
shuffle: logical = FALSE by default. Whether to perform shuffling before each SGD iteration. Generally shuffling is a good practice for SGD.

Methods

Public methods

Method `new()`

Creates GloVe model object

Usage

GloVe$new(
  rank,
  x_max,
  learning_rate = 0.15,
  alpha = 0.75,
  lambda = 0,
  shuffle = FALSE,
  init = list(w_i = NULL, b_i = NULL, w_j = NULL, b_j = NULL)
)

Arguments

rank: desired dimension for the latent vectors

x_max

integer maximum number of co-occurrences to use in the weighting function

learning_rate

numeric learning rate for SGD. I do not recommend that you modify this parameter, since AdaGrad will quickly adjust it to optimal

alpha

numeric = 0.75 the alpha in weighting function formula : \(f(x) = 1 if x > x_max; else (x/x_max)^alpha\)

lambda

numeric = 0.0 regularization parameter

shuffle

see shuffle field

init

list(w_i = NULL, b_i = NULL, w_j = NULL, b_j = NULL) initialization for embeddings (w_i, w_j) and biases (b_i, b_j). w_i, w_j - numeric matrices, should have #rows = rank, #columns = expected number of rows (w_i) / columns(w_j) in the input matrix. b_i, b_j = numeric vectors, should have length of #expected number of rows(b_i) / columns(b_j) in input matrix

Method `fit_transform()`

fits model and returns embeddings

Usage

GloVe$fit_transform(
  x,
  n_iter = 10L,
  convergence_tol = -1,
  n_threads = getOption("rsparse_omp_threads", 1L),
  ...
)

Arguments

x: An input term co-occurence matrix. Preferably in dgTMatrix format

n_iter

integer number of SGD iterations

convergence_tol

numeric = -1 defines early stopping strategy. Stop fitting when one of two following conditions will be satisfied: (a) passed all iterations (b) cost_previous_iter / cost_current_iter - 1 < convergence_tol.

n_threads

number of threads to use

...

not used at the moment

Method `get_history()`

returns value of the loss function for each epoch

Usage

GloVe$get_history()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GloVe$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

http://nlp.stanford.edu/projects/glove/

Examples

Run this code

# NOT RUN {
data('movielens100k')
co_occurence = crossprod(movielens100k)
glove_model = GloVe$new(rank = 4, x_max = 10, learning_rate = .25)
embeddings = glove_model$fit_transform(co_occurence, n_iter = 2, n_threads = 1)
embeddings = embeddings + t(glove_model$components) # embeddings + context embedings
identical(dim(embeddings), c(ncol(movielens100k), 10L))
# }

Run the code above in your browser using DataLab

Description

Arguments

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method fit_transform()

Usage

Arguments

Method get_history()

Usage

Method clone()

Usage

Arguments

References

Examples

Method `new()`

Method `fit_transform()`

Method `get_history()`

Method `clone()`