Learn R Programming

ruimtehol (version 0.3.2)

embed_docspace: Build a Starspace model for content-based recommendation

Description

Build a Starspace model for content-based recommendation (docspace). For example a user clicks on a webpage and this webpage contains a bunch or words.

Usage

embed_docspace(
  x,
  model = "docspace.bin",
  early_stopping = 0.75,
  useBytes = FALSE,
  ...
)

Value

an object of class textspace as returned by starspace.

Arguments

x

a data.frame with user interest containing the columns user_id, doc_id and text The user_id is an identifier of a user The doc_id is just an article or document identifier the text column is a character field which contains words which are part of the doc_id, words should be separated by a space and should not contain any tab characters

model

name of the model which will be saved, passed on to starspace

early_stopping

the percentage of the data that will be used as training data. If set to a value smaller than 1, 1-early_stopping percentage of the data which will be used as the validation set and early stopping will be executed. Defaults to 0.75.

useBytes

set to TRUE to avoid re-encoding when writing out train and/or test files. See writeLines for details

...

further arguments passed on to starspace except file, trainMode and fileFormat

Examples

Run this code
library(udpipe)
data(dekamer, package = "ruimtehol")
data(dekamer_theme_terminology, package = "ruimtehol")
## Which person is interested in which theme (aka document)
x <- table(dekamer$aut_person, dekamer$question_theme_main)
x <- as.data.frame(x)
colnames(x) <- c("user_id", "doc_id", "freq")
## Characterise the themes (aka document)
docs <- split(dekamer_theme_terminology, dekamer_theme_terminology$theme)
docs <- lapply(docs, FUN=function(x){
  data.frame(theme = x$theme[1], text = paste(x$term, collapse = " "),
             stringsAsFactors=FALSE)
})
docs <- do.call(rbind, docs)

## Build a model
train <- merge(x, docs, by.x = "doc_id", by.y = "theme")
train <- subset(train, user_id %in% sample(levels(train$user_id), 4))
set.seed(123456789)
model <- embed_docspace(train, dim = 10, early_stopping = 1)
plot(model)

Run the code above in your browser using DataLab