Learn R Programming

⚠️There's a newer version (0.8.9) of this package.Take me there.

R-package 'polmineR'

Base package of the PolMine-Toolkit

Purpose

The purpose of the package 'polmineR' is to facilitate the interactive analysis of corpora using R. Core objectives for the development of the package are performance, usability, and a modular design.

There are many tools already for text mining. Why yet another one? Important incentives for developing the package were:

  • to create a package that makes the creation and analysis of subcorpora (called 'partitions' here) as easy as possible. A particular strength of the package should be to support contrastive/comparative research.
  • to keep the original text accessible. The polmineR is based on the conviction that statistical analysis alone may be blind and deaf.
  • to provide an open source platform that will make text mining more productive, avoiding prohibitive costs of any kind. Well, some familiarity with R is still necessary.

Design

The polmineR relies on the Open Corpus Workbench (CWB) as a backend and uses the rcqp package as an interface. The CWB is particularly efficient for storing large corpora and offers a powerful language for querying corpora, the Corpus Query Processor (CQP). The architecture may be overengineered if you work with smaller corpora. It is meant to make working with larger corpora efficient, both locally, or on a server.

Background

The polmineR-package was specifically developed to make full use of the XML annotation structure of the corpora created in the PolMine project (see polmine.sowi.uni-due.de). The core PolMine corpora are corpora of plenary protocols. In these corpora, speakers, parties etc. are structurally annotated. The polmineR-package is meant to help making full use of the rich annotation structure.

Core functions

  • partition: Set up a partition (i.e. subcorpus);
  • context: Analyse the context of a query (including some statistics);
  • dispersion: Analyse the dispersion of a query across one or two dimensions (absolute and relative frequencies);
  • compare: Compare two partitions to identify specific vocabulary (using a chi-square test).
  • count: Count features

State of affairs

There are quite a few further functions, some of which are experimental. The publication of the polmineR-package on CRAN is planned as soon as the portability of the package is ensured. Most recent developments will be available here on GitHub.

Installation

Theoretically, it sould be easy to install the package with the devtools mechanism. It has been checked on a preliminary basis that the package is portable, but feedback is most welcome. The tricky part of the installation will usually be the rcqp package. See the package vignette for some advice.

Feedback

Getting feedback is most welcome! I want this to be a useful package not just for me. Please do get in touch: Andreas Blaette, University of Duisburg-Essen (andreas.blaette@uni-due.de).

Copy Link

Version

Install

install.packages('polmineR')

Monthly Downloads

425

Version

0.6.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Andreas Blaette

Last Published

June 23rd, 2016

Functions in polmineR (0.6.1)

as.markdown

turn partition into markdown
comp-class

S4 class for comparing corpora
as.cqp

Convert a string to a CQP query
compare

compare features of two partitions
blapply

apply a function over a list or bundle with and without verbose parallelization
as.speeches

Split partition into speeches
browse

display in browser
bundle-class

bundle class
as.TermDocumentMatrix

as.TermDocumentMatrix / as.DocumentTermMatrix
chisquare

perform chisquare-text
cooccurrencesBundle-class

S4 cooccurrencesBundle class
cooccurrencesReshaped

Methods for manipulating cooccurrencesReshaped-class-objects
corpus

get corpus
cooccurrences-class

cooccurrences
contextBundle-class

S4 contextBundle class
cooccurrence

find cooccurrences
context

Analyze context of a node word
context-class

S4 context class
count

get counts
cooccurrences

get all cooccurrences in a partition
cqpserver

start CQP server
CQI.super

Interfaces for accessing the CWB
enrich

enrich an object
adjustEncoding

adjust encoding
getTokenStream

get token stream
kwic

KWIC output / concordances
hits-class

hits class
html

restore fulltext as html
getTermFrequencies

get term frequencies
getEncoding

get the encoding of a corpus
flatten

flatten a nested list
dotplot

dotplot
datesPeriod

generate the sattribute
encoding

get/set encoding slot of an object
dispersion-class

dispersion class
cpos

get corpus positions
pAttribute

get pAttribute
dispersion

Dispersion of a query or multiple queries
partitionBundle-class

partitionBundle class
noise

detect noise
mail

mail result
meta

metainformation
partitionBundle

Generate a list of partitions
split,partition-method

split partition into partitionBundle
tempcorpus-class

S4 class to capture core information on a temporary CWB corpus
pAttributes

get availables pAttributes
sAttributes,character-method

Print S-Attributes in a partition or corpus
resetRegistry

reset CORPUS_REGISTRY
frequencies

Frequency breakdown of the variation of query results
read

Return to the original text and read
size

get corpus size
scatterplot

word scatterplot
getTerms

get terms available in a corpus or partition
terms-partition-method

get terms available in a corpus
tTest

perform t-test
trim

trim an object
TermDocumentMatrix

Methods for TermDocumentMatrix / DocumentTermMatrix
ll

text statistics
use

use corpus
textstat-class

S4 textstat class
view

browse an object using View()
weigh

weigh a matrix
kwic-class

kwic (S4 class)
ngrams-class

get ngrams
partition

Initialize a partition
partition-class

partition class
means

calculate means
name

generic methods defined in the polmineR-package
polmineR-package

polmineR-package