encode

encode,data.frame-method

encode,data.table-method

encode,regions-method

encode,character-method

a <code>data.frame</code> to encode

.Object

name

columns of .Object with tokens (such as word/pos/lemma)

pAttributes

columns of .Object that will be encoded as structural attributes

sAttributes

registry

data directory for indexed corpus files

dataDir

verbose

corpus

sAttribute

values

pAttribute

If <code>.Object</code> is a <code>data.frame</code>, it needs to have a column with the token
stream (column name 'word'), and further columns with either p-attributes,
or s-attributes. The corpus will be encoded successively, starting with the
p-attributes.

Library for corpus analysis using the Corpus Workbench as an
efficient back end for indexing and querying large corpora. The package offers
functionality to flexibly create partitions and to carry out basic statistical
operations (count, co-occurrences etc.). The original full text of documents
can be reconstructed and inspected at any time. Beyond that, the package is
intended to serve as an interface to packages implementing advanced statistical
procedures. Respective data structures (document term matrices, term co-
occurrence matrices etc.) can be created based on the indexed corpora.

encode: Encode s-attribute or corpus.

Description

Usage

Arguments

Details

Examples