as.speeches

A <code>partition</code>, or length-one character vector indicating a CWB corpus.

.Object

The s-attribute that provides the dates of sessions.

s_attribute_date

The s-attribute that provides the names of speakers.

s_attribute_name

Number of tokens between strucs assumed to make the difference
whether a speech has been interrupted (by an interjection or question), or
whether to assume seperate speeches.

Whether to use multicore, defaults to <code>FALSE</code>.

A logical value, defaults to <code>TRUE</code>.

verbose

progress

Split entire corpus or a partition into speeches. The heuristic is to split
the corpus/partition into partitions on day-to-day basis first, using the
s-attribute provided by <code>s_attribute_date</code>. These subcorpora are then
splitted into speeches by speaker name, using s-attribute
<code>s_attribute_name</code>. If there is a gap larger than the number of tokens
supplied by argument <code>gap</code>, contributions of a speaker are assumed to be
two seperate speeches.

Library for corpus analysis using the Corpus Workbench as an
efficient back end for indexing and querying large corpora. The package offers
functionality to flexibly create partitions and to carry out basic statistical
operations (count, co-occurrences etc.). The original full text of documents
can be reconstructed and inspected at any time. Beyond that, the package is
intended to serve as an interface to packages implementing advanced statistical
procedures. Respective data structures (document term matrices, term co-
occurrence matrices etc.) can be created based on the indexed corpora.

as.speeches: Split corpus or partition into speeches.

Description

Usage

Arguments

Value

Examples