kotzias_reviews: Kotzias Reviews

Description

A dataset containing a list of 4 review data sets. Each data set contains sentences with a postive (1) or negative review (-1) taken from reviews of products, movies, & restaurants. The data, compiled by Kotzias, Denil, De Freitas, & Smyth (2015), was originally taken from amazon.com, imdb.com, & yelp.com. Kotzias et al. (2015) provide the following description in the README: "For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. We attempted to select sentences that have a clearly positive or negative connotaton [sic], the goal was for no neutral sentences to be selected. This data set has been manipulated from the original to be split apart by element (sentence split). The original 0/1 metric has also been converted to -1/1. Please cite Kotzias et al. (2015) if you reuse the data here.

Usage

data(kotzias_reviews)

Arguments

Format

A list with 3 elements

Details

Each data set contains a dataframe of:

text. The sentences from the review.
rating. A human scoring of the text.
element_id. An index for the original text element (row number).
sentence_id. A sentence number from 1-n within each element_id.

References

Kotzias, D., Denil, M., De Freitas, N. & Smyth,P. (2015). From group to individual labels using deep features. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 597-606. http://mdenil.com/media/papers/2015-deep-multi-instance-learning.pdf