This function runs a Cosine Delta analysis (Smith and Aldridge 2011; Evert et al. 2017).
delta(
q.data,
k.data,
tokens = "word",
remove_punct = FALSE,
remove_symbols = TRUE,
remove_numbers = TRUE,
lowercase = TRUE,
n = 1,
trim = TRUE,
threshold = 150,
features = FALSE,
cores = NULL
)
If features is set to FALSE then the output is a data frame containing the results of all comparisons between the Q texts and the K texts. If features is set to TRUE then the output is a list containing the results data frame and the vector of features used for the analysis.
The questioned or disputed data, either as a corpus (the output of create_corpus()
) or as a quanteda
dfm (the output of vectorize()
).
The known or undisputed data, either as a corpus (the output of create_corpus()
) or as a quanteda
dfm (the output of vectorize()
).
The type of tokens to extract, either "word" (default) or "character".
A logical value. FALSE (default) keeps punctuation marks.
A logical value. TRUE (default) removes symbols.
A logical value. TRUE (default) removes numbers
A logical value. TRUE (default) transforms all tokens to lower case.
The order or size of the n-grams being extracted. Default is 1.
A logical value. If TRUE (default) then only the most frequent tokens are kept.
A numeric value indicating how many most frequent tokens to keep if trim = TRUE. The default is 150.
Logical with default FALSE. If TRUE, then the output will contain the features used.
The number of cores to use for parallel processing (the default is one).
Evert, Stefan, Thomas Proisl, Fotis Jannidis, Isabella Reger, Steffen Pielström, Christof Schöch & Thorsten Vitt. 2017. Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities 32. ii4–ii16. https://doi.org/10.1093/llc/fqx023. Smith, Peter W H & W Aldridge. 2011. Improving Authorship Attribution: Optimizing Burrows’ Delta Method*. Journal of Quantitative Linguistics 18(1). 63–88. https://doi.org/10.1080/09296174.2011.533591.
Q <- enron.sample[c(5:6)]
K <- enron.sample[-c(5:6)]
delta(Q, K)
Run the code above in your browser using DataLab