sentimentr (version 2.6.1)

profanity_by: Profanity Rate By Groups

Description

Approximate the profanity (polarity) of text by grouping variable(s). For a full description of the profanity detection algorithm see profanity. See profanity for more details about the algorithm, the profanity/valence shifter keys that can be passed into the function, and other arguments that can be passed.

Usage

profanity_by(text.var, by = NULL, group.names, ...)

Arguments

text.var

The text variable. Also takes a profanityr or profanity_by object.

by

The grouping variable(s). Default NULL uses the original row/element indices; if you used a column of 12 rows for text.var these 12 rows will be used as the grouping variable. Also takes a single grouping variable or a list of 1 or more grouping variables.

group.names

A vector of names that corresponds to group. Generally for internal use.

Other arguments passed to profanity.

Value

Returns a data.table with grouping variables plus:

  • element_id - The id number of the original vector passed to profanity

  • sentence_id - The id number of the sentences within each element_id

  • word_count - Word count summed by grouping variable

  • profanity_count - The number of profanities used by grouping variable

  • sd - Standard deviation (sd) of the sentence level profanity rate by grouping variable

  • ave_profanity - Profanity rate

Chaining

See the sentiment_by for details about sentimentr chaining.

Examples

Run this code
# NOT RUN {
bw <- sample(lexicon::profanity_alvarez, 4)
mytext <- c(
   sprintf('do you like this %s?  It is %s. But I hate really bad dogs', bw[1], bw[2]),
   'I am the best friend.',
   NA,
   sprintf('I %s hate this %s', bw[3], bw[4]),
   "Do you really like it?  I'm not happy"
)

## works on a character vector but not the preferred method avoiding the 
## repeated cost of doing sentence boundary disambiguation every time 
## `profanity` is run
profanity(mytext)
profanity_by(mytext)

## preferred method avoiding paying the cost 
mytext <- get_sentences(mytext)

profanity_by(mytext)
get_sentences(profanity_by(mytext))

(myprofanity <- profanity_by(mytext))
stats::setNames(get_sentences(profanity_by(mytext)),
    round(myprofanity[["ave_profanity"]], 3))

brady <- get_sentences(crowdflower_deflategate)
library(data.table)
bp <- profanity_by(brady)
crowdflower_deflategate[bp[ave_profanity > 0,]$element_id, ]

vulgars <- bp[["ave_profanity"]] > 0
stats::setNames(get_sentences(bp)[vulgars],
    round(bp[["ave_profanity"]][vulgars], 3))
    
bt <- data.table(crowdflower_deflategate)[, 
    source := ifelse(grepl('^RT', text), 'retweet', 'OP')][,
    belichick := grepl('\\bb[A-Za-z]+l[A-Za-z]*ch', text, ignore.case = TRUE)][]

prof_bel <- with(bt, profanity_by(text, by = list(source, belichick)))

plot(prof_bel)
# }

Run the code above in your browser using DataCamp Workspace