Create a frequency table either with a corpus or with a subcorpus. With a corpus, a frequency table is a based on two attributes (structural or positional). With a subcorpus object, a frequency table is based on two anchors (match, matchend, target, keyword) and a positional attribute for each anchor.
cqp_ftable(x, ...) # S3 method for cqp_corpus
cqp_ftable(x, attribute1, attribute2, attribute1.use.id = FALSE,
attribute2.use.id = FALSE, structural.attribute.unique.id = FALSE,
subcorpus = NULL, ...)
# S3 method for cqp_subcorpus
cqp_ftable(x, anchor1, attribute1,
anchor2, attribute2, cutoff = 0, ...)
The attribute for the modalities of the first variable of the cross-tabulation. If x
is a subcorpus, positional attribute only.
The attribute for the modalities of the second variable of the cross-tabulation. If x
is a subcorpus, positional attribute only.
If attribute1
is a structural attribute and has values (see cqi_structural_attribute_has_values
), switch between region ids (struc) and values (default).
If attribute2 is a structural attribute and has values (see cqi_structural_attribute_has_values
), switch between region ids (struc) and values (default).
Count tokens or ids. See details for more info.
Not implemented yet.
The anchor for individuals of the first variable, if x
is a subcorpus (anchor may be : match, matchend, target, keyword).
The anchor for individuals of the second variable, if x
is a subcorpus (anchor may be : match, matchend, target, keyword).
Filter the frequency table.
Ignored.
A frequency table stored as a flat (3-column) dataframe : for each observed combination of modalities, the first column gives the modality in the first variable, the second column the modality in the second variable, and the third column the observed frequency of the cooccurrence.
Some explanations for the structural.attribute.unique.id
option (see the vignette RcqpIntroduction).
Positional attributes (and structural attributes having values) are represented
with their string values rather than with ids. For positional
attributes, it is only a matter of presentation, since each id has its own
string; but for structural attributes having values, it may entail a different
counting, since these values are not unique: occurrences of phenomena belonging
to different structs are then counted together if two structs have the same
value. You can force the use of ids rather than string values with the
attribute1.use.id
and attribute2.use.id
options.
Counts are made on token basis, i.e. each token of the corpus is an
individual on which the two modalities (attributes) are considered. If you
use two structural attributes as arguments in cqp_ftable
,
and one of them does not have values, then the third column counts the number of
tokens. In the following example, each line gives
the length (in number of tokens, third column) of each sentence (second column)
in each novel represented by its title:
c <- corpus("DICKENS"); f <- cqp_ftable(c, "novel_title", "s") f[1:10,]
If both structural attributes have values, you may want to count the number of
times the modalities are cooccurring, rather than the total number of
tokens included in these cooccurrences. For that purpose, you can use the
structural.attribute.unique.id=TRUE
option. In the following
example, we count the number of time each head appears in each novel :
f <- cqp_ftable(c, "novel_title", "pp_h", structural.attribute.unique.id=TRUE) f[1:10,]
Here on the contrary, we count the total number of tokens in each prepositional phrase having a given head :
f <- cqp_ftable(c, "novel_title", "pp_h") f[1:10,]