Learn R Programming

polmineR (version 0.7.9)

Corpus: Corpus class.

Description

The R6 Corpus class offers a set of methods to retrieve and manage CWB indexed corpora.

Usage

Corpus

Format

An object of class R6ClassGenerator of length 24.

Fields

corpus

character vector (length 1), a CWB corpus

encoding

encoding of the corpus (typically 'UTF-8' or 'latin1'), assigned automatically upon initialization of the corpus

cpos

a two-column matrix with regions of a corpus underlying the s-attributes of the data.table in field s_attributes

s_attributes

a data.table with the values of a set of s-attributes

stat

a data.table with counts

Arguments

corpus

a corpus

registryDir

the directory where the registry file resides

dataDir

the data directory of the corpus

p_attribute

p-attribute, to perform count

s_attributes

s-attributes

decode

logical, whether to turn token ids into strings upon counting

as.html

logical

Methods

initialize(corpus, p_attribute = NULL, s_attributes = NULL)

Initialize a new object of class Corpus.

count(p_attribute = getOption("polmineR.p_attribute"), decode = TRUE)

Perform counts.

as.partition()

turn Corpus into a partition

getInfo(as.html = FALSE)

showInfo()

Examples

Run this code
# NOT RUN {
use("polmineR")
REUTERS <- Corpus$new("REUTERS")
infofile <- REUTERS$getInfo()
if (interactive()) REUTERS$showInfo()

# use Corpus class to manage counts
REUTERS <- Corpus$new("REUTERS", p_attribute = "word")
REUTERS$stat

# use Corpus class for creating partitions
REUTERS <- Corpus$new("REUTERS", s_attributes = c("id", "places"))
usa <- partition(REUTERS, places = "usa")
sa <- partition(REUTERS, places = "saudi-arabia", regex = TRUE)

reut <- REUTERS$as.partition()
# }

Run the code above in your browser using DataLab