This is a subset of the data used in the paper, which
was assembled by Prettenhofer and Stein (2010). It contains 1000 reviews
of books on Amazon, of which 500 were selected from the original training data
and 500 from the test data.
The full dataset has been used for a variety of things, including
classification using svm. The subset was chosen small enough to keep the computation
time low, while still containing the examples in the paper.