Learn R Programming

polmineR (version 0.7.8)

features,partition-method: Get features by comparison.

Description

The features of two objects, usually a partition defining a corpus of interest, and a partition defining a reference corpus are compared. The most important purpose is term extraction.

Usage

# S4 method for partition
features(x, y, included = FALSE, method = "chisquare",
  verbose = FALSE)

# S4 method for count features(x, y, by = NULL, included = FALSE, method = "chisquare", verbose = TRUE)

# S4 method for partitionBundle features(x, y, included = FALSE, method = "chisquare", verbose = TRUE, mc = getOption("polmineR.mc"), progress = FALSE)

# S4 method for ngrams features(x, y, included = FALSE, method = "chisquare", verbose = TRUE, ...)

Arguments

x

a partition or partitionBundle object

y

a partition object, it is assumed that the coi is a subcorpus of ref

included

TRUE if coi is part of ref, defaults to FALSE

method

the statistical test to apply (chisquare or log likelihood)

verbose

logical, defaults to TRUE

by

the columns used for merging, if NULL (default), the pAttribute of x will be used

mc

logical, whether to use multicore

progress

logical

...

further parameters

References

Manning / Schuetze ...

Baker, Paul (2006): Using Corpora in Discourse Analysis. London: continuum, p. 121-149 (ch. 6).

Manning, Christopher D.; Schuetze, Hinrich (1999): Foundations of Statistical Natural Language Processing. MIT Press: Cambridge, Mass., pp. 151-189 (ch. 5).

Examples

Run this code
# NOT RUN {
use("polmineR")

kauder <- partition(
  "GERMAPARLMINI",
  speaker = "Volker Kauder", interjection = "speech",
  pAttribute="word"
  )
all <- partition("GERMAPARLMINI", interjection = "speech", pAttribute = "word")

terms_kauder <- features(x = kauder, y = all, included = TRUE)
top100 <- subset(terms_kauder, rank_chisquare <= 100)
head(top100)

# a different way is to compare count objects
kauder_count <- as(kauder, "count")
all_count <- as(all, "count")
terms_kauder <- features(kauder_count, all_count, included = TRUE)
top100 <- subset(terms_kauder, rank_chisquare <= 100)
head(top100)

speakers <- partitionBundle("GERMAPARLMINI", sAttribute = "speaker")
speakers <- enrich(speakers, pAttribute = "word")
speaker_terms <- features(speakers[[1:5]], all, included = TRUE, progress = TRUE)
dtm <- as.DocumentTermMatrix(speaker_terms, col = "chisquare")
# }

Run the code above in your browser using DataLab