freq.analysis analyzes texts regarding frequencies of tokens,
word classes etc.
freq.analysis(txt.file, ...)
"freq.analysis"(txt.file, corp.freq = NULL, desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env", corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)
"freq.analysis"(txt.file, corp.freq = NULL, desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env", corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)kRp.tagged-class,
kRp.txt.freq-class,
kRp.analysis-class or kRp.txt.trans-class,
or a character vector which must
be a valid path to a file containing the text to be analyzed.tagger.kRp.corp.freq-class.txt.file is already of class kRp.tagged-class. Defaults to "kRp.env" to get the settings by
get.kRp.env. Set to "tokenize" to use tokenize."nonpunct" has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used.corp.freq to provide appropriate idf values for the types in txt.file. Missing idf values will result in NA.kRp.txt.freq-class.
kRp.txt.freq-class.By default, if the text has yet to be tagged,
the language definition is queried by calling get.kRp.env(lang=TRUE) internally.
Or, if txt.file has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags for all supported languages.
get.kRp.env, kRp.tagged-class,
kRp.corp.freq-class
## Not run:
# freq.analysis("~/some/text.txt", corp.freq=my.LCC.data)
# ## End(Not run)
Run the code above in your browser using DataLab