searchK: Computes diagnostic values for models with different values of K (number of topics).

Description

With user-specified initialization, this function runs selectModel for different user-specified topic numbers and computes diagnostic properties for the returned model. These include exclusivity, semantic coherence, heldout likelihood, bound, lbound, and residual dispersion.

Usage

searchK(documents, vocab, K, init.type = "Spectral", N = floor(0.1 *
  length(documents)), proportion = 0.5, heldout.seed = NULL, M = 10,
  cores = 1, ...)

Arguments

documents

The documents to be used for the stm model

vocab

The vocabulary to be used for the stmmodel

A vector of different topic numbers

init.type

The method of initialization. See stm for options. Note that the default option here is different from the main function.

Number of docs to be partially held out

proportion

Proportion of docs to be held out.

heldout.seed

If desired, a seed to use when holding out documents for later heldout likelihood computation

M value for exclusivity computation

cores

Number of CPUs to use for parallel computation

...

Other diagnostics parameters.

Value

exclus

Exclusivity of each model.

semcoh

Semantic coherence of each model.

heldout

Heldout likelihood for each model.

residual

Residual for each model.

bound

Bound for each model.

lbound

lbound for each model.

em.its

Total number of EM iterations used in fiting the model.

Details

See the vignette for interepretation of each of these measures. Each of these measures is also available in exported functions:

exclusivity: exclusivity
semantic coherence: semanticCoherence
heldout likelihood: make.heldout and eval.heldout
bound: calculated by stm accessible by max(model$convergence$bound)
lbound: a correction to the bound that makes the bounds directly comparable max(model$convergence$bound) + lfactorial(model$settings$dim$K)
residual dispersion: checkResiduals

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
K<-c(5,10,15) 
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta)
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
set.seed(02138)
K<-c(5,10,15) 
kresult <- searchK(documents, vocab, K, prevalence=~treatment + s(pid_rep), data=meta)
plot(kresult)

# }
# NOT RUN {
 
# }