tidytext (version 0.1.5)

sentiments: Sentiment lexicons from three sources

Description

Three lexicons for sentiment analysis are combined here in a tidy data frame. The lexicons are the NRC Emotion Lexicon from Saif Mohammad and Peter Turney, the sentiment lexicon from Bing Liu and collaborators, of Finn Arup Nielsen, and of Tim Loughran and Bill Loughran. Words with non-ASCII characters were removed from the lexicons.

Usage

sentiments

Arguments

Format

A data frame with 27,314 rows and 4 variables:

word

An English word

sentiment

A sentiment whose possible values depend on the lexicon. The "afinn" lexicon has no sentiment category (all are NA), and each of the others can be "positive" or "negative". The NRC lexicon can also be "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", or "trust", and the Loughran lexicon can also be "litigious", "uncertainty", "constraining", and "superfluous".

lexicon

The source of the sentiment for the word. One of either "nrc", "bing", or "AFINN".

score

A numerical score for the sentiment. This value is NA for the Bing and NRC lexicons, and runs between -5 and 5 for the AFINN lexicon.

Details

Note that the loughran lexicon is best suited for financial text, (e.g. where words like "share" is not necessarily positive, and "liability" not necessarily negative).