plotRemoved:
Plot documents, words and tokens removed at various word thresholds
Description
A plot function which shows the results of using different thresholds in prepDocuments on the size of the corpus.
Usage
plotRemoved(documents, lower.thresh)
Arguments
documents
The documents to be used for the stm model
lower.thresh
A vector of integers, each of which will be tested as a lower threshold
for the prepDocuments function.
Value
Invisibly returns a list of
lower.thresh
The sorted threshold values
ndocs
The number of documents dropped for each value of the lower threshold
nwords
The number of entries of the vocab dropped for each value of the lower threshold.
ntokens
The number of tokens dropped for each value of the lower threshold.
Details
For a lower threshold, prepDocuments will drop words which appear in fewer than that number of documents, and remove documents which contain no more words. This function allows the user to pass a vector of
lower thresholds and observe how prepDocuments will handle each threshold. This function produces three plots, showing the number of words, the number of documents, and the total number of tokens removed as a function of threshold values. A dashed red line is plotted at the total number of documents, words and tokens respectively.