Corpus: Corpus class.

Description

The R6 Corpus class offers a set of methods to retrieve and manage CWB indexed corpora.

Usage

Corpus

Format

An object of class R6ClassGenerator of length 24.

Fields

corpus: character vector (length 1), a CWB corpus

encoding

encoding of the corpus (typically 'UTF-8' or 'latin1'), assigned automatically upon initialization of the corpus

cpos

a two-column matrix with regions of a corpus underlying the s-attributes of the data.table in field s_attributes

s_attributes

a data.table with the values of a set of s-attributes

stat

a data.table with counts

Arguments

corpus: a corpus
registryDir: the directory where the registry file resides
dataDir: the data directory of the corpus
p_attribute: p-attribute, to perform count
s_attributes: s-attributes
decode: logical, whether to turn token ids into strings upon counting
as.html: logical

Methods

initialize(corpus, p_attribute = NULL, s_attributes = NULL): Initialize a new object of class Corpus.
count(p_attribute = getOption("polmineR.p_attribute"), decode = TRUE): Perform counts.
as.partition(): turn Corpus into a partition
getInfo(as.html = FALSE)
showInfo()

Examples

Run this code

# NOT RUN {
use("polmineR")
REUTERS <- Corpus$new("REUTERS")
infofile <- REUTERS$getInfo()
if (interactive()) REUTERS$showInfo()

# use Corpus class to manage counts
REUTERS <- Corpus$new("REUTERS", p_attribute = "word")
REUTERS$stat

# use Corpus class for creating partitions
REUTERS <- Corpus$new("REUTERS", s_attributes = c("id", "places"))
usa <- partition(REUTERS, places = "usa")
sa <- partition(REUTERS, places = "saudi-arabia", regex = TRUE)

reut <- REUTERS$as.partition()
# }

Run the code above in your browser using DataLab