Learn R Programming

koRpus (version 0.06-5)

textFeatures: Extract text features for authorship analysis

Description

This function combines several of koRpus' methods to extract the 9-Feature Set for authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).

Usage

textFeatures(text, hyphen = NULL)

Arguments

text
An object of class kRp.tagged-class, kRp.txt.freq-class or kRp.analysis-class. Can also be a list of these objects, if you want to analyze more than one text at once.
hyphen
An object of class kRp.hyphen-class, if text has already been hyphenated. If text is a list and hyphen is not NULL, it must also be a list with one object for each text, in the same order.

Value

A data.frame:
uniqWd
Number of unique words (tokens)
cmplx
Complexity (TTR)
sntCt
Sentence count
sntLen
Average sentence length
syllCt
Average syllable count
charCt
Character count (all characters, including spaces)
lttrCt
Letter count (without spaces, punctuation and digits)
FOG
Gunning FOG index
flesch
Flesch Reading Ease index

References

Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1--10.

Examples

Run this code
## Not run: 
# set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="~/bin/treetagger",
#       preset="en"))
# tagged.txt <- treetag("example_text.txt")
# tagged.txt.features <- textFeatures(tagged.txt)
# ## End(Not run)

Run the code above in your browser using DataLab