Learn R Programming

text2vec (version 0.2.0)

glove: Perform fit of the GloVe model.

Description

Train GloVe word embeddings model via fully asynchronous parallel AdaGrad.

Usage

glove(tcm, vocabulary_size, word_vectors_size, x_max, num_iters,
  shuffle_seed = NA_integer_, learning_rate = 0.05, verbose = TRUE,
  convergence_threshold = 0, grain_size = 100000L, max_cost = 10,
  alpha = 0.75, ...)

## S3 method for class 'dgTMatrix': glove(tcm, vocabulary_size = nrow(tcm), word_vectors_size, x_max, num_iters, shuffle_seed = NA_integer_, learning_rate = 0.05, verbose = TRUE, convergence_threshold = -1, grain_size = 100000L, max_cost = 10, alpha = 0.75, ...)

## S3 method for class 'Matrix': glove(tcm, ...)

Arguments

tcm
object which represents Term-Coocurence matrix, which used in training. At the moment only dgTMatrix or (coercible to dgTMatrix) is supported. In future releases we will add support for out-of-core learning and streaming TCM from
vocabulary_size
number of words in underlying Term-Coocurence matrix
word_vectors_size
desired dimenson for word vectors
x_max
maximum number of cooccurences to use in weighting function. See GloVe paper for details: http://nlp.stanford.edu/pubs/glove.pdf
num_iters
number of AdaGrad epochs
shuffle_seed
logical whether to perform shuffling before each SGD iteration. Generally this is good idea, but from my experience, in this particular case, it doesn't improve convergence. So, there is no shuffling by default: shuffle_seed = NA_integ
learning_rate
learning rate for SGD, I don't recommend to modify this parameter, AdaGrad will quickly adjust it to optimal.
verbose
whether to display training inforamtion
convergence_threshold
defines early stopping stratergy. We stop fitting when one of two following conditions will be satisfied: a) spent all iterations

or

b) cost_previous_iter / cost_current_iter - 1 < convergence_threshold

grain_size
I don't recommend to adjust this paramenter. This is the grain_size for RcppParallel::parallelReduce. See http://rcppcore.github.io/RcppParallel/#grain-size for details.
max_cost
the maximum absolute value of calculated gradient for any single co-occurrence pair. Try to set to smaller vaue if you have problems with numerical stability.
alpha
alpha in weighting function formula : $f(x) = 1 if x > x_max; else (x/x_max)^alpha$
...
arguments passed to other methods (not used at the moment). Generelly good idea for stochastic gradient descent

Methods (by class)

  • dgTMatrix: fits GloVe model on dgTMatrix - sparse Matrix in triplet form
  • Matrix: fits GloVe model on Matrix input

See Also

http://nlp.stanford.edu/projects/glove/

Examples

Run this code
text8 <- read_lines('./text8')
it <- itoken(text8, preprocess_function = identity,
             tokenizer = function(x) str_split(x, fixed(" ")))
vocab <- vocabulary(it) %>%
 prune_vocabulary(term_count_min = 5)

it <- itoken(text8, preprocess_function = identity,
             tokenizer = function(x) str_split(x, fixed(" ")))

corpus <- create_vocab_corpus(iterator = it,
                              vocabulary = vocab,
                              grow_dtm = FALSE,
                              skip_grams_window = 5)
tcm <- get_tcm(corpus)

RcppParallel::setThreadOptions(numThreads = 8)
fit <- glove(tcm = tcm, shuffle_seed = 1L, word_vectors_size = 50,
              x_max = 10, learning_rate = 0.2,
              num_iters = 50, grain_size = 1e5,
              max_cost = 100, convergence_threshold = 0.01)
word_vectors <- fit$word_vectors[[1]] + fit$word_vectors[[2]]
rownames(word_vectors) <- rownames(tcm)
qlst <- prepare_analogue_questions('./questions-words.txt', rownames(word_vectors))
res <- check_analogue_accuracy(questions_lst = qlst, m_word_vectors = word_vectors)

Run the code above in your browser using DataLab