The equation used by the algorithm to assign value to polarity of
each sentence fist utilizes the sentiment dictionary to tag polarized words.
Each paragraph
(\(p_i = \{s_1, s_2, ..., s_n\}\)) composed of
sentences, is broken into element sentences
(\(s_i,j = \{w_1, w_2, ..., w_n\}\)) where \(w\)
are the words within sentences. Each sentence (\(s_j\)) is broken into a
an ordered bag of words. Punctuation is removed with the exception of pause
punctuations (commas, colons, semicolons) which are considered a word within
the sentence. I will denote pause words as \(cw\) (comma words) for
convenience. We can represent these words as an i,j,k notation as
\(w_{i,j,k}\). For example \(w_{3,2,5}\) would be the fifth word of the
second sentence of the third paragraph. While I use the term paragraph this
merely represent a complete turn of talk. For example t may be a cell level
response in a questionnaire composed of sentences.
The words in each sentence (\(w_{i,j,k}\)) are searched and compared to a
dictionary of polarized words (e.g., Jockers (2017) dictionary found in
the lexicon package). Positive (\(w_{i,j,k}^{+}\)) and
negative (\(w_{i,j,k}^{-}\)) words are tagged with a \(+1\)
and \(-1\) respectively. I will denote polarized words as \(pw\) for
convenience. These will form a polar cluster (\(c_{i,j,l}\))
which is a subset of the a sentence
(\(c_{i,j,l} \subseteq s_i,j \)).
The polarized context cluster (\(c_{i,j,l}\)) of words is pulled from around
the polarized word (\(pw\)) and defaults to 4 words before and two words
after \(pw\)) to be considered as valence shifters. The cluster can be represented as
(\(c_{i,j,l} = \{pw_{i,j,k - nb}, ..., pw_{i,j,k} , ..., pw_{i,j,k - na}\}\)),
where \(nb\) & \(na\) are the parameters n.before
and n.after
set by the user. The words in this polarized context cluster are tagged as
neutral (\(w_{i,j,k}^{0}\)), negator (\(w_{i,j,k}^{n}\)),
amplifier [intensifier]] (\(w_{i,j,k}^{a}\)), or de-amplifier
[downtoner] (\(w_{i,j,k}^{d}\)). Neutral words hold no value in
the equation but do affect word count (\(n\)). Each polarized word is then
weighted (\(w\)) based on the weights from the polarity_dt
argument
and then further weighted by the function and number of the valence shifters
directly surrounding the positive or negative word (\(pw\)). Pause
(\(cw\)) locations (punctuation that denotes a pause including commas,
colons, and semicolons) are indexed and considered in calculating the upper
and lower bounds in the polarized context cluster. This is because these marks
indicate a change in thought and words prior are not necessarily connected
with words after these punctuation marks. The lower bound of the polarized
context cluster is constrained to
\(\max \{pw_{i,j,k - nb}, 1, \max \{cw_{i,j,k} < pw_{i,j,k}\}\}\) and the upper bound is
constrained to \(\min \{pw_{i,j,k + na}, w_{i,jn}, \min \{cw_{i,j,k} > pw_{i,j,k}\}\}\)
where \(w_{i,jn}\) is the number of words in the sentence.
The core value in the cluster, the polarized word is acted upon by valence
shifters. Amplifiers (intensifiers) increase the polarity by 1.8 (.8 is the default weight
(\(z\))). Amplifiers (\(w_{i,j,k}^{a}\)) become de-amplifiers if the context
cluster contains an odd number of negators (\(w_{i,j,k}^{n}\)). De-amplifiers
(downtoners) work to decrease the polarity. Negation (\(w_{i,j,k}^{n}\)) acts on
amplifiers/de-amplifiers as discussed but also flip the sign of the polarized
word. Negation is determined by raising -1 to the power of the number of
negators (\(w_{i,j,k}^{n}\)) + 2. Simply, this is a result of a belief that two
negatives equal a positive, 3 negatives a negative and so on.
The adversative conjunctions (i.e., 'but', 'however', and 'although') also
weight the context cluster. A adversative conjunction before the polarized
word (\(w_{adversative\,conjunction}, ..., w_{i, j, k}^{p}\)) up-weights
the cluster by
\(1 + z_2 * \{|w_{adversative\,conjunction}|, ..., w_{i, j, k}^{p}\}\)
(.85 is the default weight (\(z_2\))). An adversative conjunction after
the polarized word down-weights the cluster by
\(1 + \{w_{i, j, k}^{p}, ..., |w_{adversative\,conjunction}| * -1\} * z_2\).
The number of occurrences before and after the polarized word are multiplied by
1 and -1 respectively and then summed within context cluster. It is this
value that is multiplied by the weight and added to 1. This
corresponds to the belief that an adversative conjunction makes the next
clause of greater values while lowering the value placed on the prior clause.
The researcher may provide a weight \(z\) to be utilized with
amplifiers/de-amplifiers (default is .8; de-amplifier weight is constrained
to -1 lower bound). Last, these weighted context clusters (\(c_{i,j,l}\)) are
summed (\(c'_{i,j}\)) and divided by the square root of the word count (\(\sqrt{w_{i,jn}}\)) yielding an unbounded
polarity score (\(\delta\)) for each sentence.
$$\delta=\frac{c'_{i,j}}{\sqrt{w_{i,jn}}}$$
Where:
$$c'_{i,j}=\sum{((1 + w_{amp} + w_{deamp})\cdot w_{i,j,k}^{p}(-1)^{2 + w_{neg}})}$$
$$w_{amp}= (w_{b} > 1) + \sum{(w_{neg}\cdot (z \cdot w_{i,j,k}^{a}))}$$
$$w_{deamp} = \max(w_{deamp'}, -1)$$
$$w_{deamp'}= (w_{b} < 1) + \sum{(z(- w_{neg}\cdot w_{i,j,k}^{a} + w_{i,j,k}^{d}))}$$
$$w_{b} = 1 + z_2 * w_{b'}$$
$$w_{b'} = \sum{\\(|w_{adversative\,conjunction}|, ..., w_{i, j, k}^{p}, w_{i, j, k}^{p}, ..., |w_{adversative\,conjunction}| * -1}\\)$$
$$w_{neg}= \left(\sum{w_{i,j,k}^{n}}\right) \bmod {2}$$