kRp.text.analysis
analyzes texts in various ways.
kRp.text.analysis(txt.file, tagger = "kRp.env", force.lang = NULL, desc.stat = TRUE, lex.div = TRUE, corp.freq = NULL, corp.rm.class = "nonpunct", corp.rm.tag = c(), ...)
kRp.tagged-class
,
kRp.txt.freq-class
,
kRp.analysis-class
or kRp.txt.trans-class
, or
a character vector which must be be a valid path to a file containing the text to be analyzed.txt.file
is already of class kRp.tagged-class
. Defaults to "kRp.env"
to get the settings by
get.kRp.env
. Set to "tokenize"
to use tokenize
.lex.div
.kRp.corp.freq-class
. If present,
a frequency index for the analyzed text is computed (see details)."nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE)
to be used.tagger
.kRp.analysis-class
.
treetag()
,
freq.analysis()
and lex.div()
.By default, if the text has to be tagged yet,
the language definition is queried by calling get.kRp.env(lang=TRUE)
internally.
Or, if txt.file
has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE)
or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags
for all supported languages.
set.kRp.env
,
get.kRp.env
,
kRp.POS.tags
, lex.div
## Not run:
# kRp.text.analysis("/some/text.txt")
# ## End(Not run)
Run the code above in your browser using DataLab