Learn R Programming

doc2vec (version 0.1.1)

as.matrix.paragraph2vec: Get the document or word vectors of a paragraph2vec model

Description

Get the document or word vectors of a paragraph2vec model as a dense matrix.

Usage

# S3 method for paragraph2vec
as.matrix(
  x,
  which = c("docs", "words"),
  normalize = TRUE,
  encoding = "UTF-8",
  ...
)

Arguments

x

a paragraph2vec model as returned by paragraph2vec or read.paragraph2vec

which

either one of 'docs' or 'words'

normalize

logical indicating to normalize the embeddings. Defaults to TRUE.

encoding

set the encoding of the row names to the specified encoding. Defaults to 'UTF-8'.

...

not used

Value

a matrix with the document or word vectors where the rownames are the documents or words upon which the model was trained

See Also

paragraph2vec, read.paragraph2vec

Examples

Run this code
# NOT RUN {
library(tokenizers.bpe)
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language %in% "french")
x <- subset(x, nchar(text) > 0 & txt_count_words(text) < 1000)

model <- paragraph2vec(x = x, type = "PV-DM",   dim = 15,  iter = 5)
# }
# NOT RUN {
model <- paragraph2vec(x = x, type = "PV-DBOW", dim = 100, iter = 20)
# }
# NOT RUN {
embedding <- as.matrix(model, which = "docs")
embedding <- as.matrix(model, which = "words")
embedding <- as.matrix(model, which = "docs", normalize = FALSE)
embedding <- as.matrix(model, which = "words", normalize = FALSE)
# }

Run the code above in your browser using DataLab