Learn R Programming

stm (version 1.0.0)

searchK: Computes diagnostic values for models with different values of K (number of topics).

Description

With user-specified initialization, this function runs selectModel for different user-specified topic numbers and computes diagnostic properties for the resturned model. These include exclusivity, semantic coherence, heldout likelihood, bound, lbound, and residual.

Usage

searchK(documents, vocab, K, init.type = "Spectral", 
                    N=floor(.1*length(documents)), proportion=.5, heldout.seed=NULL,
                    M=10,...)

Arguments

documents
The documents to be used for the stm model
vocab
The vocabulary to be used for the stmmodel
K
A vector of different topic numbers
init.type
The method of initialization. Must be either Latent Dirichlet Allocation (LDA), Dirichlet Multinomial Regression Topic Model (DMR), a random initialization, `spectral', or a previous STM object. If an initialization other than spectral is
N
Number of docs to be partially held out
proportion
Proportion of docs to be held out.
heldout.seed
If desired, a seed to use when holding out documents for later heldout likelihood computation
M
M value for exclusivity computation
...
Other diagnostics parameters.

Value

  • exclusExclusivity of each model.
  • semcohSemantic coherence of each model.
  • heldoutHeldout likelihood for each model.
  • residualResidual for each model.
  • boundBound for each model.
  • lboundlbound for each model.
  • em.itsTotal number of EM iterations used in fiting the model.

Examples

Run this code
K<-c(5,10,15) 
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta)
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
set.seed(02138)
K<-c(5,10,15) 
kresult <- searchK(documents, vocab, K, prevalence=~treatment + s(pid_rep), data=meta)

Run the code above in your browser using DataLab