Tools for mining CWB corpora.
polmineR()
The package provides functions for basic text statistics for corpora that are managed by the Corpus Workbench (CWB). A core feature is to generate subcorpora/partitions based on metadata. The package is also meant to serve as an interface between the CWB and R-packages implementing more sophisticated statistical procedures (e.g. lsa, lda, topicmodels) or providing further functionality for text mining (e.g. tm).
Any analysis using this package will usually start with setting up a
subcorpus/partition (with partition
). A set of partitions can be
generated with partitionBundle
. Once a partition or a set of partitions
has been set up, core functions are cooccurrences
and
features
. Based on a partition bundle, a
term-document matrix (class 'TermDocumentMatrix' from the tm package) can be
generated (with as.TermDocumentMatrix
). This opens the door to the wealth of
statistical methods implemented in R.
When the package is loaded and attached, the package will look for a file name 'polmineR.conf' in a directory defined by the environment variable 'POLMINER_DIR'. It will take general settings for polmineR from that file. Second, templates are restored.
Jockers, Matthew L. (2014): Text Analysis with R for Students of Literature. Cham et al: Springer.
Baker, Paul (2006): Using Corpora in Discourse Analysis. London: continuum.