powered by
The pre-processed and tokenized version of the ECB_press_conferences corpus of press conferences. The processing involved the following steps:
Subset paragraphs shorter than 10 words
Removal of stop words
Part-of-speech tagging, following which only nouns, proper nouns and adjective were retained.
Detection and merging of frequent compound words
Frequency-based cleaning of rare and very common words
ECB_press_conferences_tokens
A quanteda::tokens object.
ECB_press_conferences
LDA(ECB_press_conferences_tokens)
Run the code above in your browser using DataLab