searchK: Computes diagnostic values for models with different values of K (number of topics).

Description

With user-specified initialization, this function runs selectModel for different user-specified topic numbers and computes diagnostic properties for the resturned model. These include exclusivity, semantic coherence, heldout likelihood, bound, lbound, and residual.

Usage

searchK(documents, vocab, K, init.type = "Spectral", 
                    N=floor(.1*length(documents)), proportion=.5, heldout.seed=NULL,
                    M=10,...)

Arguments

documents

The documents to be used for the stm model

vocab

The vocabulary to be used for the stmmodel

A vector of different topic numbers

init.type

The method of initialization. Must be either Latent Dirichlet Allocation (LDA), Dirichlet Multinomial Regression Topic Model (DMR), a random initialization, `spectral', or a previous STM object. If an initialization other than spectral is

Number of docs to be partially held out

proportion

Proportion of docs to be held out.

heldout.seed

If desired, a seed to use when holding out documents for later heldout likelihood computation

M value for exclusivity computation

...

Other diagnostics parameters.

Value

exclusExclusivity of each model.
semcohSemantic coherence of each model.
heldoutHeldout likelihood for each model.
residualResidual for each model.
boundBound for each model.
lboundlbound for each model.
em.itsTotal number of EM iterations used in fiting the model.

Examples

Run this code

K<-c(5,10,15) 
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta)
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
set.seed(02138)
K<-c(5,10,15) 
kresult <- searchK(documents, vocab, K, prevalence=~treatment + s(pid_rep), data=meta)

Run the code above in your browser using DataLab