Learn R Programming

polmineR (version 0.6.1)

partition: Initialize a partition

Description

Set up an object of the partition class. Frequency lists are computeted and kept in the stat-slot if pAttribute is not NULL.

Usage

partition(.Object, ...)
"partition"(.Object, def = NULL, name = c(""), encoding = NULL, pAttribute = NULL, meta = NULL, regex = FALSE, xml = "flat", id2str = TRUE, type = NULL, mc = FALSE, verbose = TRUE, ...)
"partition"(.Object, ...)
"partition"(.Object)
"partition"(.Object, def = NULL, name = c(""), regex = FALSE, pAttribute = NULL, id2str = TRUE, type = NULL, verbose = TRUE, mc = FALSE, ...)

Arguments

.Object
character-vector - the CWB-corpus to be used
...
parameters passed into the partition-method
def
list consisting of a set of character vectors (see details and examples)
name
name of the new partition, defaults to "noName"
encoding
encoding of the corpus (typically "LATIN1 or "(UTF-8)), if NULL, the encoding provided in the registry file of the corpus (charset="...") will be used b
pAttribute
the pAttribute(s) for which term frequencies shall be retrieved
meta
a character vector
regex
logical (defaults to FALSE), if TRUE, the s-attributes provided will be handeled as regular expressions; the length of the character vectors with s-attributes then needs to be 1
xml
either 'flat' (default) or 'nested'
id2str
whether to turn token ids to strings (set FALSE to minimize object.size / memory consumption)
type
character vector (length 1) specifying the type of corpus / partition (e.g. "plpr")
mc
whether to use multicore (for counting terms)
verbose
logical, defaults to TRUE

Value

An object of the S4 class 'partition'

Details

The function sets up a partition based on a list of s-attributes with respective values. The s-attributes defining the partition are a list, e.g. list(text_type="speech", text_year="2013"). The values of the list may contain regular expressions. To use regular expression syntax, set the parameter regex to "TRUE". Regular expressions are passed into grep, i.e. the regex syntax used in R needs to be used (double backlashes etc.).

The XML imported into the CWB may be "flat" or "nested". This needs to be indicated with the parameter xml (default is "flat"). If you generate a partition based on a flat XML structure, some performance gain may be achieved when ordering the sAttributes with decreasingly restrictive conditions. If you have a nested XML, it is mandatory that the order of the sAttributes provided reflects the hierarchy of the XML: The top-level elements need to be positioned at the beginning of the list with the s-attributes, the the most restrictive elements at the end.

If pAttribute is not NULL, a count of tokens in the corpus will be performed and kept in the stat-slot of the partition-object. The length of the pAttribute character vector may be 1 or more. If two or more p-attributes are provided, The occurrence of combinations will be counted. A typical scenario is to combine the p-attributes "word" or "lemma" and "pos".

Examples

Run this code
if (require(polmineR.sampleCorpus) && require(rcqp)){
   use(polmineR.sampleCorpus)
   spd <- partition(
     "PLPRBTTXT", text_party="SPD", text_type="speech"
     )
   kauder <- partition(
   "PLPRBTTXT", text_name="Volker Kauder", pAttribute="word"
   )
   merkel <- partition(
     "PLPRBTTXT", text_name=".*Merkel",
     pAttribute="word", regex=TRUE
     )
   sAttributes(merkel, "text_date")
   sAttributes(merkel, "text_name")
   merkel <- partition(
     "PLPRBTTXT", text_name="Angela Dorothea Merkel",
     text_date="2009-11-10", text_type="speech", pAttribute="word"
     )
   merkel <- subset(merkel, !word %in% punctuation)
   merkel <- subset(merkel, !word %in% tm::stopwords("de"))
}

Run the code above in your browser using DataLab