ECB_press_conferences_tokens

The pre-processed and tokenized version of the
ECB_press_conferences corpus of press conferences. The processing
involved the following steps:<ul>
<li>Subset paragraphs shorter than 10 words</li>
<li>Removal of stop words</li>
<li>Part-of-speech tagging, following which only nouns, proper nouns and
adjective were retained.</li>
<li>Detection and merging of frequent compound words</li>
<li>Frequency-based cleaning of rare and very common words</li>
</ul>

datasets

A framework that joins topic modeling and sentiment analysis of
textual data. The package implements a fast Gibbs sampling estimation of
Latent Dirichlet Allocation (Griffiths and Steyvers (2004)
<doi:10.1073/pnas.0307752101>) and Joint Sentiment/Topic Model (Lin, He,
Everson and Ruger (2012) <doi:10.1109/TKDE.2011.48>). It offers a variety of
helpers and visualizations to analyze the result of topic modeling. The
framework also allows enriching topic models with dates and externally
computed sentiment measures. A flexible aggregation scheme enables the
creation of time series of sentiment or topical proportions from the enriched
topic models. Moreover, a novel method jointly aggregates topic proportions
and sentiment measures to derive time series of topical sentiment.

Olivier Delmarcelle

sentopics

Tools for Joint Sentiment and Topic Analysis of Textual Data

Samuel Borms

Chengua Lin

Yulan He

Jose Bernardo

David Robinson

Julia Silge

ECB_press_conferences_tokens function

Format

Tokenized press conferences — ECB_press_conferences_tokens

Tokenized press conferences

ECB_press_conferences_tokens: Tokenized press conferences

Description

Usage

Arguments

Format

See Also

Examples