Learn R Programming

polmineR (version 0.7.9)

as.speeches: Split corpus or partition into speeches.

Description

Split entire corpus or a partition into speeches. The heuristic is to split the corpus/partition into partitions on day-to-day basis first, using the s-attribute provided by s_attribute_date. These subcorpora are then splitted into speeches by speaker name, using s-attribute s_attribute_name. If there is a gap larger than the number of tokens supplied by argument gap, contributions of a speaker are assumed to be two seperate speeches.

Usage

as.speeches(.Object, s_attribute_date = grep("date", s_attributes(.Object),
  value = TRUE), s_attribute_name = grep("name", s_attributes(.Object), value
  = TRUE), gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE)

Arguments

.Object

A partition, or length-one character vector indicating a CWB corpus.

s_attribute_date

The s-attribute that provides the dates of sessions.

s_attribute_name

The s-attribute that provides the names of speakers.

gap

Number of tokens between strucs assumed to make the difference whether a speech has been interrupted (by an interjection or question), or whether to assume seperate speeches.

mc

Whether to use multicore, defaults to FALSE.

verbose

A logical value, defaults to TRUE.

progress

logical

Value

A partition_bundle, the names of the objects in the bundle are the speaker name, the date of the speech and an index for the number of the speech on a given day, concatenated by underscores.

Examples

Run this code
# NOT RUN {
use("polmineR")
speeches <- as.speeches(
  "GERMAPARLMINI",
  s_attribute_date = "date", s_attribute_name = "speaker"
)
speeches_count <- count(speeches, p_attribute = "word")
tdm <- as.TermDocumentMatrix(speeches_count, col = "count")

bt <- partition("GERMAPARLMINI", date = "2009-10-27")
speeches <- as.speeches(bt, s_attribute_name = "speaker")
summary(speeches)
# }

Run the code above in your browser using DataLab