data_corpus_LMRDsample: Sample from Large Movie Review Dataset (Maas et al. 2011)
Description
A sample of 100 positive and 100 negative reviews from the Maas et al. (2011)
dataset for sentiment classification. The original dataset contains 50,000
highly polar movie reviews.
Usage
data_corpus_LMRDsample
Arguments
Format
The corpus docvars consist of:
docnumber
serial (within set and polarity) document number
rating
user-assigned movie rating on a 1-10 point integer scale
polarity
either neg or pos to indicate whether the
movie review was negative or positive. See Maas et al (2011) for the
cut-off values that governed this assignment.
References
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew
Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis". The
49th Annual Meeting of the Association for Computational Linguistics (ACL
2011).
See Also
data_codebook_sentiment for an example codebook and usage with this corpus
if (requireNamespace("quanteda", quietly = TRUE)) {
# Inspect the corpus summary(data_corpus_LMRDsample)
# Sample a few reviews head(data_corpus_LMRDsample, 3)
}