Learn R Programming

polmineR (version 0.7.11)

decode: Decode structural attribute, partition or corpus.

Description

Function that can be applied on a corpus or a partition. The returned data.table can be coerced to a tibble easily and processed according to tidytext approaches.

Usage

decode(.Object, s_attribute = NULL, verbose = TRUE, ...)

decode(.Object, s_attribute = NULL, verbose = TRUE, ...)

Arguments

.Object

The corpus or partition to decode (character vector).

s_attribute

The s-attribute to decode.

verbose

Logical value, whether to output messages.

...

Further arguments.

Value

The return value is a data.table.

Details

If a s_attribute is a character vector providing one or several structural attributes, the return value is a data.table with the left and right corpus positions in the first and second columns ("cpos_left" and "cpos_right"). Values of further columns are the decoded s-attributes. The name of the s-attribute is the column name. An error is thrown if the lengths of structural attributes differ (i.e. if there is a nested data structure).

If s_attribute is NULL, the token stream is decoded for all positional attributes that are present. Structural attributes are reported in additional columns. Decoding the entire corpus may be useful to make a transition to processing data following the 'tidy' approach, or to manipulate the corpus data and to re-encode the corpus.

Examples

Run this code
# NOT RUN {
use("polmineR")

# Scenario 1: Decode one or two s-attributes
dt <- decode("REUTERS", s_attribute = "id")
dt <- decode("REUTERS", s_attribute = c("topics_cat", "places"))

# Scenario 2: Decode entire corpus
dt <- decode("REUTERS")

# Scenario 3: Decode partition
p <- partition("REUTERS", places = "kuwait", regex = TRUE)
dt <- decode(p)

# Scenario 4: Decode partition_bundle
pb <- partition_bundle("REUTERS", s_attribute = "id")
dts <- lapply(as.list(pb), decode)
dts <- lapply(names(dts), function(n) dts[[n]][, speech_id := n])
dt <- data.table::rbindlist(dts)
# }

Run the code above in your browser using DataLab