lex.div(txt, segment = 100, factor.size = 0.72, min.tokens = 9,
rand.sample = 42, window = 100, case.sens = FALSE, lemmatize = FALSE,
detailed = FALSE, measure = c("TTR", "MSTTR", "MATTR", "C", "R", "CTTR",
"U", "S", "K", "Maas", "HD-D", "MTLD", "MTLD-MA"), char = c("TTR", "MATTR",
"C", "R", "CTTR", "U", "S", "K", "Maas", "HD-D", "MTLD", "MTLD-MA"),
char.steps = 5, log.base = 10, force.lang = NULL, keep.tokens = FALSE,
corp.rm.class = "nonpunct", corp.rm.tag = c(), quiet = FALSE)
kRp.tagged-class
,
kRp.txt.freq-class
,
log
for details.TRUE
all raw tokens and types will be preserved in the resulting object,
in a slot called
tt
. For the types, also their frequency in the analyzed text will be listed."nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE)
to be used.FALSE
, short status messages will be shown.
TRUE
will also suppress all potential warnings regarding the validation status of measures.kRp.TTR-class
.lex.div
calculates a variety of proposed indices for lexical diversity. In the following formulae,
$N$ refers to
the total number of tokens, and $V$ to the number of types:
[object Object],[object Object],[object Object],[object Object],Wrapper function: C.ld
,[object Object],Wrapper function: R.ld
,[object Object],Wrapper function: CTTR
,[object Object],Wrapper function: U.ld
,[object Object],Wrapper function: S.ld
,[object Object],[object Object],[object Object],[object Object],[object Object]By default, if the text has to be tagged yet,
the language definition is queried by calling get.kRp.env(lang=TRUE)
internally.
Or, if txt
has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE)
or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags
for all supported languages.
Maas, H.-D., (1972). "Uber den Zusammenhang zwischen Wortschatzumfang und L"ange eines Textes. Zeitschrift f"ur Literaturwissenschaft und Linguistik, 2(8), 73--96.
McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488.
McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392.
Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.
kRp.POS.tags
,
kRp.tagged-class
, kRp.TTR-class