
Approximate the sentiment (polarity) of text by grouping variable(s). For a
full description of the sentiment detection algorithm see
sentiment
. See sentiment
for more details about the algorithm, the sentiment/valence shifter keys
that can be passed into the function, and other arguments that can be passed.
sentiment_by(
text.var,
by = NULL,
averaging.function = sentimentr::average_downweighted_zero,
group.names,
...
)
The text variable. Also takes a sentimentr
or
sentiment_by
object.
The grouping variable(s). Default NULL
uses the original
row/element indices; if you used a column of 12 rows for text.var
these 12 rows will be used as the grouping variable. Also takes a single
grouping variable or a list of 1 or more grouping variables.
A function for performing the group by averaging.
The default, average_downweighted_zero
, downweights
zero values in the averaging. Note that the function must handle
NA
s. The sentimentr functions
average_weighted_mixed_sentiment
and average_mean
are also
available. The former upweights negative when the analysts suspects the
speaker is likely to surround negatives with positives (mixed) as a polite
social convention but still the affective state is negative. The later is a
standard mean average.
A vector of names that corresponds to group. Generally for internal use.
Other arguments passed to sentiment
.
Returns a data.table with grouping variables plus:
element_id - The id number of the original vector passed to sentiment
sentence_id - The id number of the sentences within each element_id
word_count - Word count sum
med by grouping variable
sd - Standard deviation (sd
) of the sentiment/polarity score by grouping variable
ave_sentiment - Sentiment/polarity score mean
average by grouping variable
sentimentr uses non-standard evaluation when you use with()
OR
%$%
(magrittr) and looks for the vectors within the data set
passed to it. There is one exception to this...when you pass a
get_sentences()
object to sentiment_by()
to the first argument
which is text.var
it calls the sentiment_by.get_sentences_data_frame
method which requires text.var
to be a get_sentences_data_frame
object. Because this object is a data.frame
its method knows this and
knows it can access the columns of the get_sentences_data_frame
object
directly (usually text.var
is an atomic vector), it just needs the
names of the columns to grab.
To illustrate this point understand that all three of these approaches result in exactly the same output:
## method 1 presidential_debates_2012 %>% get_sentences() %>% sentiment_by(by = c('person', 'time'))## method 2 presidential_debates_2012 %>% get_sentences() %$% sentiment_by(., by = c('person', 'time'))
## method 3 presidential_debates_2012 %>% get_sentences() %$% sentiment_by(dialogue, by = list(person, time))
Also realize that a get_sentences_data_frame
object also has a column
with a get_sentences_character
class column which also has a method in
sentimentr.
When you use with()
OR %$%
then you're not actually passing
the get_sentences_data_frame
object to sentimentr and hence the
sentiment_by.get_sentences_data_frame
method isn't called rather
sentiment_by
is evaluated in the environment/data of the
get_sentences_data_frame object
. You can force the object passed this
way to be evaluated as a get_sentences_data_frame
object and thus
calling the sentiment_by.get_sentences_data_frame
method by using the
.
operator as I've done in method 2 above. Otherwise you pass the name
of the text column which is actually a get_sentences_character class
and it calls its own method. In this case the by argument expects vectors or
a list of vectors and since it's being evaluated within the data set you can
use list()
.
Other sentiment functions:
sentiment()
# NOT RUN {
mytext <- c(
'do you like it? It is red. But I hate really bad dogs',
'I am the best friend.',
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `sentiment` is run
# }
# NOT RUN {
sentiment(mytext)
sentiment_by(mytext)
# }
# NOT RUN {
## preferred method avoiding paying the cost
mytext <- get_sentences(mytext)
sentiment_by(mytext)
sentiment_by(mytext, averaging.function = average_mean)
sentiment_by(mytext, averaging.function = average_weighted_mixed_sentiment)
get_sentences(sentiment_by(mytext))
(mysentiment <- sentiment_by(mytext, question.weight = 0))
stats::setNames(get_sentences(sentiment_by(mytext, question.weight = 0)),
round(mysentiment[["ave_sentiment"]], 3))
pres_dat <- get_sentences(presidential_debates_2012)
# }
# NOT RUN {
## less optimized way
with(presidential_debates_2012, sentiment_by(dialogue, person))
# }
# NOT RUN {
# }
# NOT RUN {
sentiment_by(pres_dat, 'person')
(out <- sentiment_by(pres_dat, c('person', 'time')))
plot(out)
plot(uncombine(out))
sentiment_by(out, presidential_debates_2012$person)
with(presidential_debates_2012, sentiment_by(out, time))
highlight(with(presidential_debates_2012, sentiment_by(out, list(person, time))))
# }
# NOT RUN {
# }
# NOT RUN {
## tidy approach
library(dplyr)
library(magrittr)
hu_liu_cannon_reviews %>%
mutate(review_split = get_sentences(text)) %$%
sentiment_by(review_split)
# }
Run the code above in your browser using DataLab