glove: Perform fit of the GloVe model.

Description

Train GloVe word embeddings model via fully asynchronous parallel AdaGrad.

Usage

glove(tcm, vocabulary_size, word_vectors_size, x_max, num_iters,
  shuffle_seed = NA_integer_, learning_rate = 0.05, verbose = TRUE,
  convergence_threshold = 0, grain_size = 100000L, max_cost = 10,
  alpha = 0.75, ...)
## S3 method for class 'dgTMatrix':
glove(tcm, vocabulary_size = nrow(tcm), word_vectors_size,
  x_max, num_iters, shuffle_seed = NA_integer_, learning_rate = 0.05,
  verbose = TRUE, convergence_threshold = -1, grain_size = 100000L,
  max_cost = 10, alpha = 0.75, ...)
## S3 method for class 'Matrix':
glove(tcm, ...)

Arguments

tcm

object which represents Term-Coocurence matrix, which used in training. At the moment only dgTMatrix or (coercible to dgTMatrix) is supported. In future releases we will add support for out-of-core learning and streaming TCM from

vocabulary_size

number of words in underlying Term-Coocurence matrix

word_vectors_size

desired dimenson for word vectors

x_max

maximum number of cooccurences to use in weighting function. See GloVe paper for details: http://nlp.stanford.edu/pubs/glove.pdf

num_iters

number of AdaGrad epochs

shuffle_seed

logical whether to perform shuffling before each SGD iteration. Generally this is good idea, but from my experience, in this particular case, it doesn't improve convergence. So, there is no shuffling by default: shuffle_seed = NA_integ

learning_rate

learning rate for SGD, I don't recommend to modify this parameter, AdaGrad will quickly adjust it to optimal.

verbose

whether to display training inforamtion

convergence_threshold

defines early stopping stratergy. We stop fitting when one of two following conditions will be satisfied: a) spent all iterations

b) cost_previous_iter / cost_current_iter - 1 < convergence_threshold

grain_size

I don't recommend to adjust this paramenter. This is the grain_size for RcppParallel::parallelReduce. See http://rcppcore.github.io/RcppParallel/#grain-size for details.

max_cost

the maximum absolute value of calculated gradient for any single co-occurrence pair. Try to set to smaller vaue if you have problems with numerical stability.

alpha

alpha in weighting function formula : $f(x) = 1 if x > x_max; else (x/x_max)^alpha$

...

arguments passed to other methods (not used at the moment). Generelly good idea for stochastic gradient descent

Methods (by class)

dgTMatrix: fits GloVe model on dgTMatrix - sparse Matrix in triplet form
Matrix: fits GloVe model on Matrix input

Examples

Run this code

text8 <- read_lines('./text8')
it <- itoken(text8, preprocess_function = identity,
             tokenizer = function(x) str_split(x, fixed(" ")))
vocab <- vocabulary(it) %>%
 prune_vocabulary(term_count_min = 5)

it <- itoken(text8, preprocess_function = identity,
             tokenizer = function(x) str_split(x, fixed(" ")))

corpus <- create_vocab_corpus(iterator = it,
                              vocabulary = vocab,
                              grow_dtm = FALSE,
                              skip_grams_window = 5)
tcm <- get_tcm(corpus)

RcppParallel::setThreadOptions(numThreads = 8)
fit <- glove(tcm = tcm, shuffle_seed = 1L, word_vectors_size = 50,
              x_max = 10, learning_rate = 0.2,
              num_iters = 50, grain_size = 1e5,
              max_cost = 100, convergence_threshold = 0.01)
word_vectors <- fit$word_vectors[[1]] + fit$word_vectors[[2]]
rownames(word_vectors) <- rownames(tcm)
qlst <- prepare_analogue_questions('./questions-words.txt', rownames(word_vectors))
res <- check_analogue_accuracy(questions_lst = qlst, m_word_vectors = word_vectors)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Methods (by class)

See Also

Examples