Learn R Programming

JSTORr (version 1.0.20161214)

JSTOR_findassocs: Plot the words with the strongest correlation with a given word, by time intervals

Description

Generates a plot of the top n words in all the documents that positively correlate with a given word, in ranges of years. For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). For best results, repeat the function after adding common words to the stopword list. To learn more about editing the stopword list, see the help for the JSTOR_dtmofnouns function.

Usage

JSTOR_findassocs(unpack1grams, nouns, word, n = 5, corlimit = 0.4, plimit = 0.05, topn = 20, biggest = 5, parallel = FALSE)

Arguments

unpack1grams
object returned by the function JSTOR_unpack1grams.
nouns
the object returned by the function JSTOR_dtmofnouns. A Document Term Matrix containing the documents.
word
The word to calculate the correlations with
n
the number years to aggregate documents by. For example, n = 5 (the default value) will create groups of all documents published in non-overlapping five year ranges. Note that high n values combined with high plimit and corlimit values will severly filter the output. For exploratory data analysis it's recommended to start with low n values and work up.
corlimit
The lower threshold value of the Pearson correlation statistic (default is 0.4).
plimit
The lower threshold value of the Pearson correlation statistic (default is 0.05).
topn
An integer for the number of top ranking words to plot. For example, topn = 20 (the default value) will plot the top 20 words for each range of years.
biggest
An integer to control the maximum size of the text in the plot
parallel
logical. If TRUE attempts to run the function on multiple cores. Note that this may actually be slower if you have one core, limited memory or if the data set is small due to communication of data between the cores.

Value

Returns a plot of the most frequent words per year range, with word size scaled to frequency, and a dataframe with words and counts for each year range

Examples

Run this code
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, "rouges")
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, n = 10, "pirates", topn = 100)
## findassocs <- JSTOR_findassocs(unpack1grams, nouns, n = 5, "marines", corlimit=0.6, plimit=0.001)

Run the code above in your browser using DataLab