Learn R Programming

ruimtehol (version 0.3.2)

range.textspace: Get the scale of embedding similarities alongside a Starspace model

Description

Calculates embedding similarities between 2 embedding matrices and gets the range of resulting similarities.

Usage

# S3 method for textspace
range(
  x,
  from = as.matrix(x),
  to = as.matrix(x, type = "labels"),
  probs = seq(0, 1, by = 0.01),
  breaks = "scott",
  ...
)

Value

a list with elements

  • range: the range of the embedding similarities between from and to

  • quantile: the quantiles of the embedding similarities between from and to

  • hist: the histogram of the embedding similarities between from and to

Arguments

x

an object of class textspace as returned by starspace or starspace_load_model

from

an embedding matrix. Defaults to the embeddings of all the labels and the words from the model.

to

an embedding matrix. Defaults to the embeddings of all the labels.

probs

numeric vector of probabilities ranging from 0-1. Passed on to quantile

breaks

passed on to hist

...

other parameters passed on to hist

Examples

Run this code
data(dekamer, package = "ruimtehol")
dekamer <- subset(dekamer, depotdat < as.Date("2017-02-01"))
dekamer$text <- strsplit(dekamer$question, "\\W")
dekamer$text <- lapply(dekamer$text, FUN = function(x) setdiff(x, ""))
dekamer$text <- sapply(dekamer$text, 
                       FUN = function(x) paste(x, collapse = " "))
dekamer$question_theme_main <- gsub(" ", "-", dekamer$question_theme_main)

set.seed(123456789)
model <- embed_tagspace(x = tolower(dekamer$text), 
                        y = dekamer$question_theme_main, 
                        early_stopping = 0.8, 
                        dim = 10, minCount = 5)
ranges <- range(model)
ranges$range
ranges$quantile
plot(ranges$hist, main = "Histogram of embedding similarities")                         

Run the code above in your browser using DataLab