Train a doc2vec model (Le & Mikolov, 2014) using a quanteda::tokens object.
textmodel_doc2vec(
x,
dim = 50,
type = c("dm", "dbow"),
min_count = 5,
window = 5,
iter = 10,
alpha = 0.05,
model = NULL,
use_ns = TRUE,
ns_size = 5,
sample = 0.001,
tolower = TRUE,
include_data = FALSE,
verbose = FALSE,
...
)Returns a textmodel_doc2vec object with matrices for word and document vector values in values.
Other elements are the same as textmodel_word2vec.
a quanteda::tokens or quanteda::tokens_xptr object.
the size of the word vectors.
the architecture of the model; either "dm" (distributed memory) or "dbow" (distributed bag-of-words).
the minimum frequency of the words. Words less frequent than
this in x are removed before training.
the size of the window for context words. Ignored when type = "dbow" as
its context window is the entire document (sentence or paragraph).
the number of iterations in model training.
the initial learning rate.
a trained Word2vec model; if provided, its word vectors are updated for x.
if TRUE, negative sampling is used. Otherwise, hierarchical softmax
is used.
the size of negative samples. Only used when use_ns = TRUE.
the rate of sampling of words based on their frequency. Sampling is
disabled when sample = 1.0
lower-case all the tokens before fitting the model.
if TRUE, the resulting object includes the data supplied as x.
if TRUE, print the progress of training.
additional arguments.
Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents (No. arXiv:1405.4053). arXiv. https://doi.org/10.48550/arXiv.1405.4053