
Counts the number of clause units (bounded by the <##>
, <#>
, or
<%>
annotation symbols) in a multicastR table.
mc_clauses(mcdata, bytext = FALSE)
A data.table
in multicastR format,
containing minimally a corpus
column with the names of the corpora
and a graid
column with GRAID annotation values.
Logical. If FALSE
, calculate the number of clause units
for each corpus. If TRUE
, count for each text separately.
A data.table
with the number of valid
clause units in each corpus, the total number of clause units, the number
of non-analyzed clause units ("NC"), and the percentage the later make up
of the total.
# NOT RUN {
# count clause units in the most recent version
# of the Multi-CAST data, by corpus
n <- mc_clauses(multicast())
# count by text instead
m <- mc_clauses(multicast(), bytext = TRUE)
# number of clauses units in the whole collection
sum(n$nClauses)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab