count: Get counts.

Description

Count all tokens, or number of occurrences of a query (CQP syntax may be used).

Usage

count(.Object, ...)
# S4 method for partition
count(.Object, query = NULL, cqp = is.cqp,
  id2str = TRUE, pAttribute = getOption("polmineR.pAttribute"),
  mc = getOption("polmineR.cores"), verbose = TRUE, progress = FALSE)
# S4 method for partitionBundle
count(.Object, query, pAttribute = NULL,
  freq = FALSE, total = TRUE, mc = FALSE, progress = TRUE,
  verbose = FALSE)
# S4 method for character
count(.Object, query = NULL, cqp = is.cqp,
  pAttribute = getOption("polmineR.pAttribute"), sort = FALSE,
  id2str = TRUE, verbose = TRUE)
# S4 method for vector
count(.Object, corpus, pAttribute)

Arguments

.Object

a "partition" or "partitionBundle" object, or a character vector (length 1) providing the name of a corpus

...

further parameters

query

a character vector (one or multiple terms to be looked up), CQP syntax can be used.

cqp

either logical (TRUE if query is a CQP query), or a function to check whether query is a CQP query or not (defaults to is.query auxiliary function)

id2str

logical, whether to add rownames (only if query is NULL)

pAttribute

the p-attribute(s) to use

logical, whether to use multicore (defaults to FALSE)

verbose

logical, whether to be verbose

progress

logical, whether to show progress

freq

logical, if FALSE, counts will be reported, if TRUE, frequencies

total

defaults to FALSE, if TRUE, the added value of counts (column: TOTAL) will be amended to the data.table that is returned

sort

logical, whether to sort stat

corpus

name of CWB corpus

Value

a "data.table"

Details

If .Object is a partitonBundle, the data.table returned will have the queries in the columns, and as many rows as there are in the partitionBundle.

If .Object is a character vector (length 1) and query is NULL, the count is performed for the whole partition. The method will check whether the polmineR.Rcpp package, or the cwb-lexdecode utilities are available, and use them resepectively for performance reasons.

Examples

Run this code

# NOT RUN {
  use("polmineR.sampleCorpus")
  debates <- partition("PLPRBTTXT", list(text_id=".*"), regex=TRUE)
  count(debates, query = "Arbeit") # get frequencies for one token
  count(debates, c("Arbeit", "Freizeit", "Zukunft")) # get frequencies for multiple tokens
  
  count("PLPRBTTXT", query = c("Migration", "Integration"), pAttribute = "word")

  debates <- partitionBundle(
    "PLPRBTTXT", sAttribute = "text_date", values = NULL,
    regex = TRUE, mc = FALSE, verbose = FALSE
  )
  y <- count(debates, query = "Arbeit", pAttribute = "word")
  y <- count(debates, query = c("Arbeit", "Migration", "Zukunft"), pAttribute = "word")
  
# }