kRp.corp.freq,-class: S4 Class kRp.corp.freq
Slots
meta- Metadata on the corpora (dee details).
words- Absolute word frequencies. It has at least the following columns:
num:- Some word ID from the DB, integer
word:- The word itself
lemma:- The lemma of the word
tag:- A part-of-speech tag
wclass:- The word class
lttr:- The number of characters
freq:- The frequency of that word in the corpus DB
pct:- Percentage of appearance in DB
pmio:- Appearance per million words in DB
log10:- Base 10 logarithm of word frequency
rank.avg:- Rank in corpus data,
rank ties method "average" rank.min:- Rank in corpus data,
rank ties method "min" rank.rel.avg:- Relative rank, i.e. percentile of
"rank.avg" rank.rel.min:- Relative rank, i.e. percentile of
"rank.min" inDocs:- The absolute number of documents in the corpus containing the word
idf:- The inverse document frequency
The slot might have additional columns, depending on the input material. desc- Descriptive information. It contains six numbers from the
meta information,
for convenient accessibility:
tokens:- Number of running word forms
types:- Number of distinct word forms
words.p.sntc:- Average sentence length in words
chars.p.sntc:- Average sentence length in characters
chars.p.wform:- Average word form length
chars.p.word:- Average running word length
The slot might have additional columns, depending on the input material.
Details
The slot meta simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.