analyzeSentiment: Sentiment analysis

Description

Performs sentiment analysis of given object (vector of strings, document-term matrix, corpus).

Usage

analyzeSentiment(x, language = "english", aggregate = NULL,
  rules = defaultSentimentRules(), removeStopwords = TRUE, ...)
# S3 method for Corpus
analyzeSentiment(x, language = "english", aggregate = NULL,
  rules = defaultSentimentRules(), removeStopwords = TRUE, ...)
# S3 method for character
analyzeSentiment(x, language = "english",
  aggregate = NULL, rules = defaultSentimentRules(),
  removeStopwords = TRUE, ...)
# S3 method for data.frame
analyzeSentiment(x, language = "english",
  aggregate = NULL, rules = defaultSentimentRules(),
  removeStopwords = TRUE, ...)
# S3 method for TermDocumentMatrix
analyzeSentiment(x, language = "english",
  aggregate = NULL, rules = defaultSentimentRules(),
  removeStopwords = TRUE, ...)
# S3 method for DocumentTermMatrix
analyzeSentiment(x, language = "english",
  aggregate = NULL, rules = defaultSentimentRules(),
  removeStopwords = TRUE, ...)

Arguments

A vector of characters, a data.frame, an object of type Corpus, TermDocumentMatrix or DocumentTermMatrix

language

Language used for preprocessing operations (default: English)

aggregate

A factor variable by which documents can be grouped. This helpful when joining e.g. news from the same day or move reviews by the same author

rules

A named list containing individual sentiment metrics. Therefore, each entry connsists itself of a list with first a method, followed by an optional dictionary.

removeStopwords

Flag indicating whether to remove stopwords or not (default: yes)

...

Additional parameters passed to function for e.g. preprocessing

Value

Result is a matrix with sentiment values for each document across all defined rules

Details

This function returns a data.frame with continuous values. If one desires other formats, one needs to convert these. Common examples of such formats are binary response values (positive / negative) or tertiary (positive, neutral, negative). Hence, consider using the functions convertToBinaryResponse and convertToDirection, which can convert a vector of continuous sentiment scores into a factor object.

Examples

Run this code

# via vector of strings
corpus <- c("Positive text", "Neutral but uncertain text", "Negative text")
sentiment <- analyzeSentiment(corpus)
compareToResponse(sentiment, c(+1, 0, -2))

# via Corpus from tm package
library(tm)
reut21578 <- system.file("texts", "crude", package="tm")
reuters <- Corpus(DirSource(reut21578),
                  readerControl=list(reader=readReut21578XML))
    
# via DocumentTermMatrix (with stemmed entries)
dtm <- DocumentTermMatrix(Corpus(VectorSource(c("posit posit", "negat neutral")))) 
sentiment <- analyzeSentiment(dtm)
compareToResponse(sentiment, convertToBinaryResponse(c(+1, -1)))

# By adapting the parameter rules, one can incorporate customized dictionaries
# e.g. in order to adapt to arbitrary languages
dictionaryAmplifiers <- SentimentDictionary(c("more", "much"))
sentiment <- analyzeSentiment(corpus,
                              rules=list("Amplifiers"=list(ruleRatio,
                                                           dictionaryAmplifiers)))

# On can also restrict the number of computed methods to the ones of interest
# in order to achieve performance optimizations
sentiment <- analyzeSentiment(corpus,
                              rules=list("SentimentLM"=list(ruleSentiment, 
                                                            loadDictionaryLM())))
sentiment

Run the code above in your browser using DataLab