Learn R Programming

quanteda (version 0.9.2-0)

lexdiv: calculate lexical diversity

Description

Calculate the lexical diversity or complexity of text(s).

Usage

lexdiv(x, ...)

## S3 method for class 'dfm': lexdiv(x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "Maas"), log.base = 10, ...)

## S3 method for class 'numeric': lexdiv(x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "Maas"), log.base = 10, ...)

Arguments

...
additional arguments
measure
A character vector defining the measure to calculate.
log.base
A numeric value defining the base of the logarithm.

Value

  • a vector of lexical diversity statistics, each corresponding to an input document

Details

lexdiv calculates a variety of proposed indices for lexical diversity. In the following formulae, $N$ refers to the total number of tokens, and $V$ to the number of types: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References

Covington, M.A. & McFall, J.D. (2010). Cutting the Gordian Knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94--100. Maas, H.-D., (1972). "Uber den Zusammenhang zwischen Wortschatzumfang und L"ange eines Textes. Zeitschrift f"ur Literaturwissenschaft und Linguistik, 2(8), 73--96. McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488. McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392. Michalke, Meik. (2014) koRpus: An R Package for Text Analysis. Version 0.05-5. http://reaktanz.de/?c=hacking&s=koRpus Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.

Examples

Run this code
mydfm <- dfm(subset(inaugCorpus, Year>1980))
mydfmSW <- dfm(subset(inaugCorpus, Year>1980), ignoredFeatures=stopwords("english"))
results <- data.frame(TTR = lexdiv(mydfm, "TTR"),
                      CTTR = lexdiv(mydfm, "CTTR"), 
                      U = lexdiv(mydfm, "U"),
                      TTRs = lexdiv(mydfmSW, "TTR"),
                      CTTRs = lexdiv(mydfmSW, "CTTR"), 
                      Us = lexdiv(mydfmSW, "U"))
results
cor(results)
t(lexdiv(mydfmSW, "Maas"))

Run the code above in your browser using DataLab