Learn R Programming

sentopics (version 0.7.4)

ECB_press_conferences_tokens: Tokenized press conferences

Description

The pre-processed and tokenized version of the ECB_press_conferences corpus of press conferences. The processing involved the following steps:

  • Subset paragraphs shorter than 10 words

  • Removal of stop words

  • Part-of-speech tagging, following which only nouns, proper nouns and adjective were retained.

  • Detection and merging of frequent compound words

  • Frequency-based cleaning of rare and very common words

Usage

ECB_press_conferences_tokens

Arguments

Format

A quanteda::tokens object.

See Also

ECB_press_conferences

Examples

Run this code
LDA(ECB_press_conferences_tokens)

Run the code above in your browser using DataLab