Function that performs a contrastive analysis between two given sets of texts. It generates a list of words significantly preferred by a tested author (or, a collection of authors), and another list containing the words significantly avoided by the former when compared to another set of texts. Some visualizations are available.
oppose(gui = TRUE, path = NULL,
primary.corpus = NULL,
secondary.corpus = NULL,
test.corpus = NULL,
primary.corpus.dir = "primary_set",
secondary.corpus.dir = "secondary_set",
test.corpus.dir = "test_set", ...)The function returns an object of the class stylo.results:
a list of variables, including a list of words significantly preferred in the
primary set, words significantly avoided (or, preferred in the secondary set),
and possibly some other results, if applicable.
an optional argument; if switched on, a simple yet effective
graphical interface (GUI) will appear. Default value is TRUE.
if not specified, the current working directory will be used for input/output procedures (reading files, outputting the results, etc.).
the subdirectory (within the current working
directory) that contains one or more texts to be compared to a comparison
corpus. These texts can e.g. be the oeuvre by author A (to be compared
to the oeuvre of another author B) or a collection of texts by female
authors (to be contrasted with texts by male authors). If not specified,
the default subdirectory primary_set will be used.
the subdirectory (within the current working
directory) that contains a comparison corpus: a pool of texts to be
contrasted with texts from the primary.corpus. If not specified,
the default subdirectory secondary_set will be used.
the subdirectory (within the current working directory)
that contains texts to verify the discriminatory strength of the features
extracted from the primary.set and secondary.sets. Ideally,
the test.corpus.dir should contain texts known to belong to both
classes (e.g. texts written by female and male authors in the case of
a gender-oriented study). If not specified, the default subdirectory
test_set will be used. If the default subdirectory does not exist
or does not contain any texts, the validation test will not be performed.
another option is to pass a pre-processed corpus
as an argument (here: the primary set). It is assumed that this object
is a list, each element of which is a vector containing one tokenized
sample. Refer to help(load.corpus.and.parse) to get some hints
how to prepare such a corpus.
if primary.corpus is used, then you should also
prepare a similar R object containing the secondary set.
if you decide to use test corpus, you can pass it as a pre-processed R object using this argument.
any variable produced by stylo.default.settings can be set
here, in order to overwrite the default values.
Maciej Eder, Mike Kestemont
This function performs a contrastive analysis between two given sets of texts, using Burrows's Zeta (2007) in its different flavors, including Craig's extensions (Craig and Kinney, 2009). Also, the Whitney-Wilcoxon procedure as introduced by Kilgariff (2001) is available. The function generates a vector of words significantly preferred by a tested author, and another vector containing the words significantly avoided.
Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. "R Journal", 8(1): 107-21.
Burrows, J. F. (2007). All the way through: testing for authorship in different frequency strata. "Literary and Linguistic Computing", 22(1): 27-48.
Craig, H. and Kinney, A. F., eds. (2009). Shakespeare, Computers, and the Mystery of Authorship. Cambridge: Cambridge University Press.
Hoover, D. (2010). Teasing out authorship and style with t-tests and Zeta. In: "Digital Humanities 2010: Conference Abstracts". King's College London, pp. 168-170.
Kilgariff A. (2001). Comparing Corpora. "International Journal of Corpus Linguistics" 6(1): 1-37.
stylo, classify, rolling.classify
if (FALSE) {
# standard usage:
oppose()
# batch mode, custom name of corpus directories:
oppose(gui = FALSE, primary.corpus.dir = "ShakespeareCanon",
secondary.corpus.dir = "MarloweSamples")
}
Run the code above in your browser using DataLab