predict.BTM: Predict function for a Biterm Topic Model

Description

Classify new text alongside the biterm topic model.

To infer the topics in a document, it is assumed that the topic proportions of a document is driven by the expectation of the topic proportions of biterms generated from the document.

Usage

# S3 method for BTM
predict(object, newdata, type = c("sum_b", "sub_w", "mix"), ...)

Value

a matrix containing containing P(z|d) - the probability of the topic given the biterms.

The matrix has one row for each unique doc_id (context identifier) which contains words part of the dictionary of the BTM model and has K columns, one for each topic.

Arguments

object

an object of class BTM as returned by BTM

newdata

a tokenised data frame containing one row per token with 2 columns

the first column is a context identifier (e.g. a tweet id, a document id, a sentence id, an identifier of a survey answer, an identifier of a part of a text)
the second column is a column called of type character containing the sequence of words occurring within the context identifier

type

character string with the type of prediction. Either one of 'sum_b', 'sub_w' or 'mix'. Default is set to 'sum_b' as indicated in the paper, indicating to sum over the the expectation of the topic proportions of biterms generated from the document. For the other approaches, please inspect the paper.

...

not used

References

Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng. A Biterm Topic Model For Short Text. WWW2013, https://github.com/xiaohuiyan/BTM, https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf

Examples

Run this code

if(require(udpipe)){
library(udpipe)
data("brussels_reviews_anno", package = "udpipe")
x <- subset(brussels_reviews_anno, language == "nl")
x <- subset(x, xpos %in% c("NN", "NNP", "NNS"))
x <- x[, c("doc_id", "lemma")]
model  <- BTM(x, k = 5, iter = 5, trace = TRUE)
scores <- predict(model, newdata = x, type = "sum_b")
scores <- predict(model, newdata = x, type = "sub_w")
scores <- predict(model, newdata = x, type = "mix")
head(scores)
} # End of main if statement running only if the required packages are installed

Run the code above in your browser using DataLab