build_vectors: Build fasttext vectors

Description

Trains a fasttext vector/unsupervised model following method described in Enriching Word Vectors with Subword Information using the fasttext implementation.

See FastText word representation tutorial for more information on training unsupervised models using fasttext.

Usage

build_vectors(documents, model_path, modeltype = c("skipgram", "cbow"),
  bucket = 2e+06, dim = 100, epoch = 5, label = "__label__",
  loss = c("ns", "hs", "softmax", "ova", "one-vs-all"), lr = 0.05,
  lrUpdateRate = 100, maxn = 6, minCount = 5, minn = 3, neg = 5,
  t = 1e-04, thread = 12, verbose = 2, wordNgrams = 1, ws = 5)

Arguments

documents

character vector of documents used for training

model_path

Name of output file without file extension.

modeltype

Should training be done using skipgram or cbow? Defaults to skipgram.

bucket

number of buckets

dim

size of word vectors

epoch

number of epochs

label

text string, labels prefix. Default is "label"

loss

loss function ns, hs, softmax

learning rate

lrUpdateRate

change the rate of updates for the learning rate

maxn

max length of char ngram

minCount

minimal number of word occurences

minn

min length of char ngram

neg

number of negatives sampled

sampling threshold

thread

number of threads

verbose

verbosity level

wordNgrams

max length of word ngram

size of the context window

Value

path to model file, as character

Examples

Run this code

# NOT RUN {
library(fastrtext)
text <- train_sentences
model_file <- build_vectors(text[['text']], 'my_model')
model <- load_model(model_file)
# }

Run the code above in your browser using DataLab