stmprepDocuments(documents, vocab, meta,
lower.thresh = 1, upper.thresh = Inf,
subsample=NULL, verbose = TRUE)stm.Inf which does no filtering.NULL which provides no subsampling. Note that the output may have fewer than the number of restmstmstm. Will be the same if no documents are removed.dropthresh=1 means that words which appear in only one document will be dropped. This is often advantageous as there is little information about these words but the added cost of including them in the model can be quite large. In many cases it will be helpful to set this threshold considerably higher. If the vocabulary is in excess of 5000 entries inference can slow quite a bit.
If words are removed, the function returns a vector of the original indices for the dropped items. If it removed documents it returns a vector of doc indices removed. Users with accompanying metadata or texts may want to drop those rows from the corresponding objects.head(gadarian)
#Process the data for analysis.
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$metaRun the code above in your browser using DataLab