The function kRp.text.analysis analyzes texts in various ways.
kRp.text.analysis(txt.file, tagger = "kRp.env", force.lang = NULL,
desc.stat = TRUE, lex.div = TRUE, corp.freq = NULL,
corp.rm.class = "nonpunct", corp.rm.tag = c(), ...)Either an object of class kRp.tagged-class,
kRp.txt.freq-class,
kRp.analysis-class or kRp.txt.trans-class, or
a character vector which must be be a valid path to a file containing the text to be analyzed.
A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if
txt.file is already of class kRp.tagged-class. Defaults to "kRp.env" to get the settings by
get.kRp.env. Set to "tokenize" to use tokenize.
A character string defining the language to be assumed for the text, by force.
Logical, whether a descriptive statistical analysis should be performed.
Logical, whether some lexical diversity analysis should be performed,
using lex.div.
An object of class kRp.corp.freq-class. If present,
a frequency index for the analyzed text is computed (see details).
A character vector with word classes which should be ignored for frequency analysis. The default value
"nonpunct" has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used.
A character vector with POS tags which should be ignored for frequency analysis.
Additional options to be passed through to the function defined with tagger.
An object of class kRp.analysis-class.
The function is basically a wrapper for treetag(),
freq.analysis() and lex.div().
By default, if the text has to be tagged yet,
the language definition is queried by calling get.kRp.env(lang=TRUE) internally.
Or, if txt.file has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags for all supported languages.
[1] http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
# NOT RUN {
kRp.text.analysis("/some/text.txt")
# }
Run the code above in your browser using DataLab