sentiment_by: Polarity Score (Sentiment Analysis) By Groups

Description

Approximate the sentiment (polarity) of text by grouping variable(s). For a full description of the sentiment detection algorithm see sentiment. See sentiment for more details about the algorithm, the sentiment/valence shifter keys that can be passed into the function, and other arguments that can be passed.

Usage

sentiment_by(text.var, by = NULL,
  averaging.function = sentimentr::average_downweighted_zero, group.names,
  ...)

Arguments

text.var

The text variable. Also takes a sentimentr or sentiment_by object.

The grouping variable(s). Default NULL uses the original row/element indices; if you used a column of 12 rows for text.var these 12 rows will be used as the grouping variable. Also takes a single grouping variable or a list of 1 or more grouping variables.

averaging.function

A function for performing the group by averaging. The default, average_downweighted_zero, downweights zero values in the averaging. Note that the function must handle NAs. The sentimentr functions average_weighted_mixed_sentiment and average_mean are also available. The former upweights negative when the analysts suspects the speaker is likely to surround negatives with positives (mixed) as a polite social convention but still the affective state is negative. The later is a standard mean average.

group.names

A vector of names that corresponds to group. Generally for internal use.

…

Other arguments passed to sentiment.

Value

Returns a data.table with grouping variables plus:

element_id - The id number of the original vector passed to sentiment
sentence_id - The id number of the sentences within each element_id
word_count - Word count summed by grouping variable
sd - Standard deviation (sd) of the sentiment/polarity score by grouping variable
ave_sentiment - Sentiment/polarity score mean average by grouping variable

Examples

Run this code

# NOT RUN {
mytext <- c(
   'do you like it?  It is red. But I hate really bad dogs',
   'I am the best friend.',
   "Do you really like it?  I'm not happy"
)

## works on a character vector but not the preferred method avoiding the 
## repeated cost of doing sentence boundary disambiguation every time 
## `sentiment` is run
# }
# NOT RUN {
sentiment(mytext)
sentiment_by(mytext)
# }
# NOT RUN {
## preferred method avoiding paying the cost 
mytext <- get_sentences(mytext)

sentiment_by(mytext)
sentiment_by(mytext, averaging.function = average_mean)
sentiment_by(mytext, averaging.function = average_weighted_mixed_sentiment)
get_sentences(sentiment_by(mytext))

(mysentiment <- sentiment_by(mytext, question.weight = 0))
stats::setNames(get_sentences(sentiment_by(mytext, question.weight = 0)),
    round(mysentiment[["ave_sentiment"]], 3))

pres_dat <- get_sentences(presidential_debates_2012)

# }
# NOT RUN {
## less optimized way
with(presidential_debates_2012, sentiment_by(dialogue, person))
# }
# NOT RUN {
sentiment_by(pres_dat, 'person')

(out <- sentiment_by(pres_dat, c('person', 'time')))
plot(out)
plot(uncombine(out))

sentiment_by(out, presidential_debates_2012$person)
with(presidential_debates_2012, sentiment_by(out, time))

with(cannon_reviews, sentiment_by(review, number))[order(as.numeric(number))]
# }
# NOT RUN {
highlight(with(cannon_reviews, sentiment_by(review, number)))
# }
# NOT RUN {
# }
# NOT RUN {
## tidy approach
library(dplyr)
library(magrittr)

cannon_reviews %>%
   mutate(review_split = get_sentences(review)) %$%
   sentiment_by(review_split, number)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples