The data set CNAE2
is a subset of the original CNAE-9 data, that
comprises 1080 documents categorized into 9 topics of free text business
descriptions of Brazilian companies.
Specifically, CNAE2
contains only the documents belonging to topics "4" and "9".
The data set is already pre-processed and provides the bag-of-words representation of
the documents; the columns with null counts are removed leading to a matrix with 240 documents
on a vocabulary with cardinality 357. This data set is highly sparse
(98
Class labels are stored in cl_CNAE