
Computes the multinomial dispersion of the STM residuals as in Taddy (2012)
checkResiduals(stmobj, documents, tol = 0.01)
An STM
model object for which to compute residuals.
The documents corresponding to stmobj
as in
stm
.
The tolerance parameter for calculating the degrees of freedom. Defaults to 1/100 as in Taddy(2012)
This function implements the residual-based diagnostic method of Taddy
(2012). The basic idea is that when the model is correctly specified the
multinomial likelihood implies a dispersion of the residuals:
Further details are available in the referenced paper, but broadly speaking
the dispersion is derived from the mean of the squared adjusted residuals.
We get the sample dispersion by dividing by the degrees of freedom
parameter. In estimating the degrees of freedom, we follow Taddy (2012) in
approximating the parameter tol
argument.
The function returns the estimated sample dispersion (which equals 1 under
the data generating process) and the p-value of a chi-squared test where the
null hypothesis is that
Taddy, M. 'On Estimation and Selection for Topic Models'. AISTATS 2012, JMLR W&CP 22
#An example using the Gadarian data. From Raw text to fitted model.
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta
set.seed(02138)
#maximum EM iterations set very low so example will run quickly.
#Run your models to convergence!
mod.out <- stm(docs, vocab, 3, prevalence=~treatment + s(pid_rep), data=meta,
max.em.its=5)
checkResiduals(mod.out, docs)
Run the code above in your browser using DataLab