Learn R Programming

doc2vec (version 0.2.2)

as.matrix.paragraph2vec: Get the document or word vectors of a paragraph2vec model

Description

Get the document or word vectors of a paragraph2vec model as a dense matrix.

Usage

# S3 method for paragraph2vec
as.matrix(
  x,
  which = c("docs", "words"),
  normalize = TRUE,
  encoding = "UTF-8",
  ...
)

Value

a matrix with the document or word vectors where the rownames are the documents or words upon which the model was trained

Arguments

x

a paragraph2vec model as returned by paragraph2vec or read.paragraph2vec

which

either one of 'docs' or 'words'

normalize

logical indicating to normalize the embeddings. Defaults to TRUE.

encoding

set the encoding of the row names to the specified encoding. Defaults to 'UTF-8'.

...

not used

See Also

paragraph2vec, read.paragraph2vec

Examples

Run this code
if(require(tokenizers.bpe)){
library(tokenizers.bpe)
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language %in% "french")
x <- subset(x, nchar(text) > 0 & txt_count_words(text) < 1000)

model <- paragraph2vec(x = x, type = "PV-DM",   dim = 15,  iter = 5)
# \donttest{
model <- paragraph2vec(x = x, type = "PV-DBOW", dim = 100, iter = 20)
# }

embedding <- as.matrix(model, which = "docs")
embedding <- as.matrix(model, which = "words")
embedding <- as.matrix(model, which = "docs", normalize = FALSE)
embedding <- as.matrix(model, which = "words", normalize = FALSE)
} # End of main if statement running only if the required packages are installed

Run the code above in your browser using DataLab