
Last chance! 50% off unlimited learning
Sale ends in
Calculate the lexical diversity or complexity of text(s).
textstat_lexdiv(x, measure = c("all", "TTR", "C", "R", "CTTR", "U", "S",
"Maas"), log.base = 10, ...)
an input object, such as a document-feature matrix object
a character vector defining the measure to calculate.
a numeric value defining the base of the logarithm (for measures using logs)
not used
textstat_lexdiv
returns a data.frame of documents and
their lexical diversity scores.
textstat_lexdiv
calculates a variety of proposed indices for lexical
diversity. In the following formulae,
"TTR"
:The ordinary Type-Token Ratio:
"C"
:Herdan's C (Herdan, 1960, as cited in Tweedie &
Baayen, 1998; sometimes referred to as LogTTR):
"R"
:Guiraud's Root TTR (Guiraud, 1954, as cited in
Tweedie & Baayen, 1998):
"CTTR"
:Carroll's Corrected TTR:
"U"
:Dugast's Uber Index (Dugast, 1978, as cited in
Tweedie & Baayen, 1998):
"S"
:Summer's index:
"K"
:Yule's K (Yule, 1944, as cited in Tweedie &
Baayen, 1998) is calculated by:
"Maas"
:Maas' indices (
Covington, M.A. & McFall, J.D. (2010). Cutting the Gordian Knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94--100.
Maas, H.-D., (1972). \"Uber den Zusammenhang zwischen Wortschatzumfang und L\"ange eines Textes. Zeitschrift f\"ur Literaturwissenschaft und Linguistik, 2(8), 73--96.
McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488.
McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392.
Michalke, Meik. (2014) koRpus: An R Package for Text Analysis. Version 0.05-5. http://reaktanz.de/?c=hacking&s=koRpus
Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.
# NOT RUN {
mydfm <- dfm(corpus_subset(data_corpus_inaugural, Year > 1980), verbose = FALSE)
(result <- textstat_lexdiv(mydfm, c("CTTR", "TTR", "U")))
cor(textstat_lexdiv(mydfm, "all")[,-1])
# }
Run the code above in your browser using DataLab