This class is used for objects that are returned by read.corp.LCC and read.corp.celex.
metaMetadata on the corpora (dee details).
wordsAbsolute word frequencies. It has at least the following columns:
num:Some word ID from the DB, integer
word:The word itself
lemma:The lemma of the word
tag:A part-of-speech tag
wclass:The word class
lttr:The number of characters
freq:The frequency of that word in the corpus DB
pct:Percentage of appearance in DB
pmio:Appearance per million words in DB
log10:Base 10 logarithm of word frequency
rank.avg:Rank in corpus data, rank ties method "average"
rank.min:Rank in corpus data, rank ties method "min"
rank.rel.avg:Relative rank, i.e. percentile of "rank.avg"
rank.rel.min:Relative rank, i.e. percentile of "rank.min"
inDocs:The absolute number of documents in the corpus containing the word
idf:The inverse document frequency
descDescriptive information. It contains six numbers from the meta information,
for convenient accessibility:
tokens:Number of running word forms
types:Number of distinct word forms
words.p.sntc:Average sentence length in words
chars.p.sntc:Average sentence length in characters
chars.p.wform:Average word form length
chars.p.word:Average running word length
bigramsA data.frame listing all tokens that co-occurred next to each other in the corpus:
token1:The first token
token2:The second token that appeared right next to the first
freq:How often the co-occurrance was present
sig:Log-likelihood significance of the co-occurrende
cooccurSimilar to bigrams,
but listing co-occurrences anywhere in one sentence:
token1:The first token
token2:The second token that appeared in the same sentence
freq:How often the co-occurrance was present
sig:Log-likelihood significance of the co-occurrende
The slot meta simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.
[1] http://corpora.informatik.uni-leipzig.de/download.html [2] http://celex.mpi.nl