Learn R Programming

polmineR (version 0.7.5)

encode: Encode s-attribute or corpus.

Description

If .Object is a data.frame, it needs to have a column with the token stream (column name 'word'), and further columns with either p-attributes, or s-attributes. The corpus will be encoded successively, starting with the p-attributes.

Usage

encode(.Object, ...)

# S4 method for data.frame encode(.Object, name, pAttributes = "word", sAttributes = NULL, registry = Sys.getenv("CORPUS_REGISTRY"), dataDir = NULL, verbose = TRUE)

# S4 method for data.table encode(.Object, corpus, sAttribute, verbose = TRUE)

# S4 method for regions encode(.Object, sAttribute, values, verbose = TRUE)

# S4 method for character encode(.Object, corpus, pAttribute = NULL, dataDir = NULL, verbose = TRUE)

Arguments

.Object

a data.frame to encode

...

further parameters

name

name of the (new) CWB corpus

pAttributes

columns of .Object with tokens (such as word/pos/lemma)

sAttributes

columns of .Object that will be encoded as structural attributes

registry

path to the corpus registry

dataDir

data directory for indexed corpus files

verbose

logical, whether to be verbose

corpus

the name of the CWB corpus

sAttribute

a single s-attribute

values

values for regions

pAttribute

a single p-attribute

Details

If .Object is a data.table, it is assumed to have three columns: The left corpus position, the right corpus position and the value of a s-attribute that will be encoded. The method is used to add s-attributes to a corpus.

If .Object is a (character) vector, there are two usages. If the corpus defined by the parameter corpus does not yet exist, the vector is taken as the word token stream. A new registry file, and a new data directory will be generated. If the corpus already exists, a new p-attribute will be added to the pre-existing corpus.

Examples

Run this code
# NOT RUN {
library(tm)
reut21578 <- system.file("texts", "crude", package = "tm")
reuters.tm <- VCorpus(DirSource(reut21578), list(reader = readReut21578XMLasPlain))

library(tidytext)
reuters.tibble <- tidy(reuters.tm)
reuters.tibble[["topics_cat"]] <- sapply(
  reuters.tibble[["topics_cat"]],
  function(x) paste(reuters.tibble[["topics_cat"]], collapse = "|")
)
reuters.tibble[["places"]] <- sapply(
  reuters.tibble[["places"]],
  function(x) paste(x, collapse = "|")
)
reuters.tidy <- unnest_tokens(
  reuters.tibble, output = "word", input = "text", to_lower = FALSE
)
encode(
  reuters.tidy, name = "reuters2",
  sAttributes = c("language", "places")
)
# }

Run the code above in your browser using DataLab