Learn R Programming

polmineR (version 0.7.0)

encode,data.frame-method: Encode CWB Corpus.

Description

Encode CWB Corpus.

Usage

"encode"(.Object, name, pAttributes = "word", sAttributes = NULL, registry = Sys.getenv("CORPUS_REGISTRY"), indexedCorpusDir = NULL, verbose = TRUE)

Arguments

.Object
a data.frame to encode
name
name of the (new) CWB corpus
pAttributes
columns of .Object with tokens (such as word/pos/lemma)
sAttributes
columns of .Object that will be encoded as structural attributes
registry
path to the corpus registry
indexedCorpusDir
directory where to create directory for indexed corpus files
verbose
logical, whether to be verbose

Examples

Run this code
## Not run: 
# library(tm)
# library(tibble)
# library(tidytext)
# library(plyr)
# reut21578 <- system.file("texts", "crude", package = "tm")
# reuters.tm <- VCorpus(DirSource(reut21578), list(reader = readReut21578XMLasPlain))
# reuters.tibble <- tidy(reuters.tm)
# # reuters.tibble[["topics_cat"]] <- sapply(
#   reuters.tibble[["topics_cat"]],
#   function(x) paste(x, collapse = "|")
# )
# reuters.tibble[["places"]] <- sapply(
#  reuters.tibble[["places"]],
#  function(x) paste(x, collapse = "|")
# )
# reuters.tidy <- unnest_tokens(
#   reuters.tibble, output = "word", input = "text", to_lower = FALSE
#   )
# encode(reuters.tidy, name = "reuters", sAttributes = c("language", "places"))
# ## End(Not run)

Run the code above in your browser using DataLab