JSTOR_freqwords: Plot the most frequent words by time intervals
Description
Generates a plot of the top n words in all the documents in ranges of years. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function several times after adding common words to the stopword list and excluding them by re-running the JSTOR_dtmofnouns function. The location of the English stopwords list can be found by entering this at the R prompt: paste0(.libPaths()[1], "/tm/stopwords/english.dat")
object returned by the function JSTOR_unpack1grams.
nouns
the object returned by the function JSTOR_dtmofnouns. A Document Term Matrix of nouns.
custom_stopwords
character vector of stop words to use in addition to the default set supplied by the tm package
n
the number years to aggregate documents by. For example, n = 5 (the default value) will create groups of all documents published in non-overlapping five year ranges.
lowfreq
An integer for the minimum frequency of a word to be included in the plot. Default is 300.
topn
An integer for the number of top ranking words to plot. For example, topn = 20 (the default value) will plot the top 20 words for each range of years.
biggest
An integer to control the maximum size of the text in the plot
Value
Returns a plot of the most frequent words per year, with word size scaled to frequency (accessed via freqwords$plot$plot, yes twice), and a dataframe with words and counts for each year range (accessed via freqwords$freqterms).