
This function combines several of koRpus
' methods to extract the 9-Feature Set for
authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).
textFeatures(text, hyphen = NULL)
An object of class kRp.text
. Can
also be a list of these objects, if you want to analyze more than one text at once.
An object of class kRp.hyphen
,
if text
has
already been hyphenated. If text
is a list and hyphen
is not NULL
,
it must
also be a list with one object for each text, in the same order.
A data.frame:
Number of unique words (tokens)
Complexity (TTR)
Sentence count
Average sentence length
Average syllable count
Character count (all characters, including spaces)
Letter count (without spaces, punctuation and digits)
Gunning FOG index
Flesch Reading Ease index
Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1--10.
# NOT RUN {
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
textFeatures(tokenized.obj)
} else {}
# }
Run the code above in your browser using DataLab