sentiment_by: Polarity Score (Sentiment Analysis) By Groups

Description

Approximate the sentiment (polarity) of text by grouping variable(s). For a full description of the sentiment detection algorithm see sentiment. See sentiment for more details about the algorithm, the sentiment/valence shifter keys that can be passed into the function, and other arguments that can be passed.

Usage

sentiment_by(
  text.var,
  by = NULL,
  averaging.function = sentimentr::average_downweighted_zero,
  group.names,
  ...
)

Arguments

text.var

The text variable. Also takes a sentimentr or sentiment_by object.

The grouping variable(s). Default NULL uses the original row/element indices; if you used a column of 12 rows for text.var these 12 rows will be used as the grouping variable. Also takes a single grouping variable or a list of 1 or more grouping variables.

averaging.function

A function for performing the group by averaging. The default, average_downweighted_zero, downweights zero values in the averaging. Note that the function must handle NAs. The sentimentr functions average_weighted_mixed_sentiment and average_mean are also available. The former upweights negative when the analysts suspects the speaker is likely to surround negatives with positives (mixed) as a polite social convention but still the affective state is negative. The later is a standard mean average.

group.names

A vector of names that corresponds to group. Generally for internal use.

…

Other arguments passed to sentiment.

Value

Returns a data.table with grouping variables plus:

element_id - The id number of the original vector passed to sentiment
sentence_id - The id number of the sentences within each element_id
word_count - Word count summed by grouping variable
sd - Standard deviation (sd) of the sentiment/polarity score by grouping variable
ave_sentiment - Sentiment/polarity score mean average by grouping variable

Chaining

sentimentr uses non-standard evaluation when you use with() OR %$% (magrittr) and looks for the vectors within the data set passed to it. There is one exception to this...when you pass a get_sentences() object to sentiment_by() to the first argument which is text.var it calls the sentiment_by.get_sentences_data_frame method which requires text.var to be a get_sentences_data_frame object. Because this object is a data.frame its method knows this and knows it can access the columns of the get_sentences_data_frame object directly (usually text.var is an atomic vector), it just needs the names of the columns to grab.

To illustrate this point understand that all three of these approaches result in exactly the same output:

## method 1
presidential_debates_2012 %>%
    get_sentences() %>%
    sentiment_by(by = c('person', 'time'))
## method 2
presidential_debates_2012 %>%
    get_sentences() %$%
    sentiment_by(., by = c('person', 'time'))
## method 3
presidential_debates_2012 %>%
    get_sentences() %$%
    sentiment_by(dialogue, by = list(person, time))

Also realize that a get_sentences_data_frame object also has a column with a get_sentences_character class column which also has a method in sentimentr.

When you use with() OR %$% then you're not actually passing the get_sentences_data_frame object to sentimentr and hence the sentiment_by.get_sentences_data_frame method isn't called rather sentiment_by is evaluated in the environment/data of the get_sentences_data_frame object. You can force the object passed this way to be evaluated as a get_sentences_data_frame object and thus calling the sentiment_by.get_sentences_data_frame method by using the . operator as I've done in method 2 above. Otherwise you pass the name of the text column which is actually a get_sentences_character class and it calls its own method. In this case the by argument expects vectors or a list of vectors and since it's being evaluated within the data set you can use list().

Examples

Run this code

# NOT RUN {
mytext <- c(
   'do you like it?  It is red. But I hate really bad dogs',
   'I am the best friend.',
   "Do you really like it?  I'm not happy"
)

## works on a character vector but not the preferred method avoiding the 
## repeated cost of doing sentence boundary disambiguation every time 
## `sentiment` is run
# }
# NOT RUN {
sentiment(mytext)
sentiment_by(mytext)
# }
# NOT RUN {
## preferred method avoiding paying the cost 
mytext <- get_sentences(mytext)

sentiment_by(mytext)
sentiment_by(mytext, averaging.function = average_mean)
sentiment_by(mytext, averaging.function = average_weighted_mixed_sentiment)
get_sentences(sentiment_by(mytext))

(mysentiment <- sentiment_by(mytext, question.weight = 0))
stats::setNames(get_sentences(sentiment_by(mytext, question.weight = 0)),
    round(mysentiment[["ave_sentiment"]], 3))

pres_dat <- get_sentences(presidential_debates_2012)

# }
# NOT RUN {
## less optimized way
with(presidential_debates_2012, sentiment_by(dialogue, person))
# }
# NOT RUN {
# }
# NOT RUN {
sentiment_by(pres_dat, 'person')

(out <- sentiment_by(pres_dat, c('person', 'time')))
plot(out)
plot(uncombine(out))

sentiment_by(out, presidential_debates_2012$person)
with(presidential_debates_2012, sentiment_by(out, time))

highlight(with(presidential_debates_2012, sentiment_by(out, list(person, time))))
# }
# NOT RUN {
# }
# NOT RUN {
## tidy approach
library(dplyr)
library(magrittr)

hu_liu_cannon_reviews %>%
   mutate(review_split = get_sentences(text)) %$%
   sentiment_by(review_split)
# }