JSTOR_freqwords

object returned by the function JSTOR_unpack1grams.

unpack1grams

the object returned by the function JSTOR_dtmofnouns. A Document Term Matrix of nouns.

nouns

character vector of stop words to use in addition to the default set supplied by the tm package

custom_stopwords

the number years to aggregate documents by. For example, n = 5 (the default value) will create groups of all documents published in non-overlapping five year ranges.

An integer for the minimum frequency of a word to be included in the plot. Default is 300.

lowfreq

An integer for the number of top ranking words to plot. For example, topn = 20 (the default value) will plot the top 20 words for each range of years.

topn

An integer to control the maximum size of the text in the plot

biggest


Generates a plot of the top n words in all the documents in ranges of years. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function several times after adding common words to the stopword list and excluding them by re-running the JSTOR_dtmofnouns function. The location of the English stopwords list can be found by entering this at the R prompt: paste0(.libPaths()[1], "/tm/stopwords/english.dat")


Simple exploratory text mining and document clustering of journal
articles from JSTOR's Data for Research service. Go to
\url{http://dfr.jstor.org/}, make a request for data (specify CSV as outout
format and Word Counts as data type), then once you get a zip file, unzip
it and start with one of the unpack functions and then you're ready to go
with any of the other functions. For more details on installation and
usage, see \url{https://github.com/benmarwick/JSTORr/}

JSTOR_freqwords: Plot the most frequent words by time intervals

Description

Usage

Arguments

Value

Examples