Learn R Programming

stm (version 1.1.3)

make.heldout: Heldout Likelihood by Document Completion

Description

Tools for making and evaluating heldout datasets.

Usage

make.heldout(documents, vocab, N=floor(.1*length(documents)), proportion=.5, seed=NULL) eval.heldout(model, missing)

Arguments

documents
the documents to be modeled.
vocab
the vocabulary item
N
number of docs to be partially held out
proportion
proportion of docs to be held out.
seed
the seed, set for replicability
model
an stm model
missing
a missing object created by make.heldout

Details

These functions are used to create and evaluate heldout likelihood using the document completion method. The basic idea is to hold out some fraction of the words in a set of documents, train the model and use the document-level latent variables to evaluate the probability of the heldout portion. See the example for the basic workflow.

Examples

Run this code
## Not run: 
# prep <- prepDocuments(poliblog5k.docs, poliblog5k.voc, 
#                       poliblog5k.meta,subsample=500,
#                       lower.thresh=20,upper.thresh=200)
# heldout <- make.heldout(prep$documents, prep$vocab)
# documents <- heldout$documents
# vocab <- heldout$vocab
# meta <- out$meta
# 
# stm1<- stm(documents, vocab, 5, 
#            prevalence =~ rating+ s(day), 
#            init.type="Random",
#            data=meta, max.em.its=5)
# eval.heldout(stm1, heldout$missing)
# ## End(Not run)

Run the code above in your browser using DataLab