predict.textmodel_lda: Prediction method for textmodel_lda

Description

Predicts topics of documents with a fitted LDA model. Prediction is performed by a Gibbs sampling with words allocated to topics in the fitted LDA. The result becomes different from topics() even for the same documents because predict() triggers additional iterations.

Usage

# S3 method for textmodel_lda
predict(
  object,
  newdata = NULL,
  max_iter = 2000,
  verbose = quanteda_options("verbose"),
  ...
)

Arguments

object

a fitted LDA textmodel

newdata

dfm on which prediction should be made

max_iter

the maximum number of iteration in Gibbs sampling.

verbose

logical; if TRUE print diagnostic information during fitting.

...

not used

References

Lu, Bin et al. (2011). "Multi-aspect Sentiment Analysis with Topic Models". doi:10.5555/2117693.2119585. Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops.

Watanabe, Kohei & Zhou, Yuan (2020). "Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches". doi:10.1177/0894439320907027. Social Science Computer Review.

Examples

Run this code

# NOT RUN {
require(quanteda)

data("data_corpus_moviereviews", package = "quanteda.textmodels")
corp <- head(data_corpus_moviereviews, 500)
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE)
dfmt <- dfm(toks) %>%
    dfm_remove(stopwords('en'), min_nchar = 2) %>%
    dfm_trim(min_termfreq = 0.90, termfreq_type = "quantile",
             max_docfreq = 0.1, docfreq_type = "prop")

# unsupervised LDA
lda <- textmodel_lda(head(dfmt, 450), 6)
terms(lda)
topics(lda)
predict(lda, newdata = tail(dfmt, 50))

# semisupervised LDA
dict <- dictionary(list(people = c("family", "couple", "kids"),
                        space = c("alien", "planet", "space"),
                        moster = c("monster*", "ghost*", "zombie*"),
                        war = c("war", "soldier*", "tanks"),
                        crime = c("crime*", "murder", "killer")))
slda <- textmodel_seededlda(dfmt, dict, residual = TRUE, min_termfreq = 10)
terms(slda)
topics(slda)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

References

See Also

Examples